|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding, and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alexandersson, M. Cawley, S. Pachter, L. (2000) Cross-species gene finding with a generalized pair hidden Markov model. To be published.
|
| |
2
|
|
| |
3
|
Batzoglou, S. Pachter, L. Mesirov, J. Berger, B. Lander, E. S. (2000). Comparative Analysis of Mouse and Human DNA and Applications to Exon Prediction. Genome Research 10:7 950-958.
|
| |
4
|
Burge, C. (1997). Identification of genes in human genomic DNA. PhD thesis, Stanford University, Stanford, CA.
|
| |
5
|
Burge, C., Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78-94.
|
| |
6
|
Cawley, S. (2000). Statistical Models for DNA Sequencing and Analysis. Ph.D. Thesis, Department of Statistics, U.C. Berkeley.
|
| |
7
|
Churchill, G. A. (1989) Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology, 51, 79-94.
|
| |
8
|
Dayhoff, M. O. Schwartz, R. M. Orcutt, B. C. (1978) A model of evolutionary changes in proteins. In Dayhoff, M. O. ed., Atlas of Protein Sequence and Structure, volume 5, supplement 3. National Biomedical Research Foundation, Washington D. C. 345-352.
|
| |
9
|
Durbin, R. Eddy, S. Krogh, A. Mitchison, G. (1998). Biological sequence analysis. Cambridge University Press.
|
| |
10
|
Florea, L. Hartzell, G. Zhang, Z. Rubin, G. M. Miller, W. (1998) A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence. Genome Research, 8, 967-974.
|
| |
11
|
Gelfand, M. S. Mironov, A. Pevzner, P. A. (1996). Gene recognition via spliced sequence alignment. Proc. Natl. Sci. USA, 93, 9061-9066.
|
| |
12
|
Gotoh, O. (2000). Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. Bioinformatics 16 (3), 190-202.
|
| |
13
|
|
| |
14
|
Henderson, J. Salzberg, S. Fasman, K. (1997). Finding genes in human DNA with a hidden Markov model. Journal of Computational Biology 4 (2), 127-141.
|
| |
15
|
Jareborg, N. Birney E. Durbin, R. (1999). Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs. Genome Research, 9 (9), 815-824.
|
| |
16
|
Kent, W. Zahler, A. (2000). Conservation, Regulation, Synteny, and Introns in a Large-scale C. riggsae-C, elegans Genomic Alignment. Genome Research 10:8 1115-1125.
|
| |
17
|
Krogh, a. (2000). Using Database Matches with HMMGene for Automated Gene Detection in Drosophila. Genome Research 10:4 523-528.
|
| |
18
|
|
| |
19
|
Makalowski, W. Zhang, J. Boguski, M. S. (1996). Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Genome Research, 6, 846-857.
|
| |
20
|
Mironov, A. A. Fickett, J. W., and Gelfand, M. S. (1999). Frequent alternative splicing of human genes. Genome Research, 9, 1288-1293.
|
| |
21
|
M~ller, T. Vingron, M. (1999) Modeling Amino Acid Replacement. Journal of Computational Biology, to appear.
|
| |
22
|
Needleman, S. B. Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48,443-453.
|
| |
23
|
|
| |
24
|
Pachter, L. Batzoglou, S. Spitkovsky, V. I. Banks, E. Lander, E. S. Berger, B. Kleitman, D. J. (1999) A dictionary based approach for gene annotation. Journal of Computational Biology, 6, 419-430.
|
| |
25
|
Rabiner, L. R. (1989). a tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (2), 257-286.
|
| |
26
|
Reese, M. G., Kulp, D., Tammana, H., Haussler, D. Genie - Gene Finding in Drosophila melanogaster. Genome Research, 10:4 529-538.
|
| |
27
|
Salzberg, S. L. (1998) Decision trees and Markov chains for gene finding. In Computational Methods in Molecular Biology, Salzberg, Searls, Kasif eds. 187-203.
|
| |
28
|
Searls, D. B. Murphy, K. (1995). Automata-Theoretic Models of Mutation and Alignment. ISMB-95: Proceedings of the Third International Conference on Intelligent systems for Molecular Biology, 341-349.
|
| |
29
|
Smith, T. F., Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197.
|
| |
30
|
Usuka, J. Volker, B. (2000). Gene Structure Prediction by Spliced Alignment of Genomic DNA with Protein Sequences: Increased Accuracy by Differential Splice Site Scoring. Journal of Molecular Biology, 297, no. 5, 1075-1085.
|
| |
31
|
Wiehe, T. Burset, M. Abril, J. Gebauer-Jung, S. Guigo, R. (1999) Comparative Genomics: at the Crossroads of Evolutionary Biology and Genome Sequence Analysis. Poster at Meeting of the European Society for Molecular Biology and Evolution, Barcelona.
|
| |
32
|
Wirth, A. (1998). A Plasmodium falciparum genefinder. Honours thesis, Department of Mathematics and Statistics, University of Melbourne.
|
|