Evolving the structure of Hidden Markov Models for biological sequence analysis
Evolving the structure of Hidden Markov Models for biological sequence analysis
Hidden Markov Models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic method of optimising the structure of HMMs for biological sequence analysis is highly desirable.
In this thesis, we explore the possibility of using a genetic algorithm (GA) for optimising the HMM structure. The Baum-Welch algorithm is hybridised within its evolutionary cycle. To prevent overfitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training.
The proposed GA for hidden Markov models (GA-HMM) allows HMMs with different number of states to evolve. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.
We also propose Block-HMMs where the topology of HMMs was assembled from biologically meaningful building blocks. New genetic operators are designed to evolve the HMM structure while preserving the blocks.
We applied the evolving HMM structure methods to modelling the promoter and coding region of a prokaryote and predicting the secondary structure of proteins. The Block-HMM method could generate HMM structures and find conserved promoter region and triplet codon model without any prior information on the sequences. When the Block-HMM is tested for the protein secondary structure prediction problem, it showed superior performance to other prediction methods using HMMs and was comparable to the best known techniques for this problem.
University of Southampton
Won, Kyoung-Jae
3b5c7d9a-e6bd-4624-9825-338e795b9945
2005
Won, Kyoung-Jae
3b5c7d9a-e6bd-4624-9825-338e795b9945
Won, Kyoung-Jae
(2005)
Evolving the structure of Hidden Markov Models for biological sequence analysis.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
Hidden Markov Models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic method of optimising the structure of HMMs for biological sequence analysis is highly desirable.
In this thesis, we explore the possibility of using a genetic algorithm (GA) for optimising the HMM structure. The Baum-Welch algorithm is hybridised within its evolutionary cycle. To prevent overfitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training.
The proposed GA for hidden Markov models (GA-HMM) allows HMMs with different number of states to evolve. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.
We also propose Block-HMMs where the topology of HMMs was assembled from biologically meaningful building blocks. New genetic operators are designed to evolve the HMM structure while preserving the blocks.
We applied the evolving HMM structure methods to modelling the promoter and coding region of a prokaryote and predicting the secondary structure of proteins. The Block-HMM method could generate HMM structures and find conserved promoter region and triplet codon model without any prior information on the sequences. When the Block-HMM is tested for the protein secondary structure prediction problem, it showed superior performance to other prediction methods using HMMs and was comparable to the best known techniques for this problem.
Text
1011970.pdf
- Version of Record
More information
Published date: 2005
Identifiers
Local EPrints ID: 465873
URI: http://eprints.soton.ac.uk/id/eprint/465873
PURE UUID: 53599d94-5844-4f9c-b012-e7fe4a18b466
Catalogue record
Date deposited: 05 Jul 2022 03:22
Last modified: 16 Mar 2024 20:25
Export record
Contributors
Author:
Kyoung-Jae Won
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics