ACM Home Page
Please provide us with feedback. Feedback
Picking alignments from (steiner) trees
Full text PdfPdf (1.39 MB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the sixth annual international conference on Computational biology table of contents
Washington, DC, USA
Pages: 246 - 253  
Year of Publication: 2002
ISBN:1-58113-498-3
Authors
Lior Pachter  U.C. Berkeley, Berkeley, CA
Fumei Lam  M.I.T., Cambridge, MA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 28,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/565196.565228
What is a DOI?

ABSTRACT

The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated non-conserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, succesfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ansari-Lari, M. A., Oeltjen, J. C., Schwartz, S., Zhang, Z., Muzny, D. M., Lu, J., Gorrell, J. H., Chinault, A. C., Belmont, J. W., Miller, W., Gibbs, R. A. (1998). Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Research, 8, 29--40.
2
 
3
Altschul, S. F., Gish, W., Miller. W., Myers, E. W., Lipman, D. J. (1990). Basic local alignment search tool Journal of Molecular Biology, 215, 403--410.
 
4
Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., Salzberg, S. L. (1999). Alignment of Whole Genomes. Nucleic Acids Research, 27(11), 2369--2376.
 
5
Durbin, R., Eddy, S., Krogh, A., Mitchison, G. (1998). Biological sequence analysis. Cambridge University press.
 
6
 
7
Garey, M.R. Johnson, D.S. (1977). The Rectilinear Steiner Tree Problem is NP-Complete. SIAM J. Appl. Math., 32(4), 826--834.
 
8
 
9
 
10
Gusfield, D., Balasubramanian, K., Naor, D. (1994). Parametric optimization of sequence alignment. Algorithmica, 12, 312--326.
 
11
Hanan, M. (1966). On Steiner's Problem with Rectilinear Distance. SIAM J. Appl. Math., 14(2), 255--265.
 
12
Holmes, I. (1998). Studies in Probabilistic Sequence Alignment and Evolution. Ph.D. Thesis, University of Cambridge and Sanger Centre, UK.
 
13
 
14
Kent, W. J., Zahler, A. M., (2000). Conservation, Regulation, Synteny, and Introns in a Large-scale C.briggsae-C.elegans} Genomic Alignment Genome Research, 10, 1115--1125.
 
15
 
16
Myers, E. (1986). An O(nd difference algorithm and its variations. Algorithmica, 1, 251--266.
 
17
 
18
Needleman, S. B., Wunsch, C. D., (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48, 443--453.
 
19
Pachter, L., Alexandersson, M., Cawley, S. (2001) RECOMB 2001: Proceedings of the Fifth International Conference on Computational Molecular Biology.
 
20
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257--286.
 
21
Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., Miller, W. (2000) PipMaker--A Web Server for Aligning Two Genomic DNA Sequences. Genome Research, 10(4), 577--586.
22