ACM Home Page
Please provide us with feedback. Feedback
Inverting the Viterbi algorithm: an abstract framework for structure design
Full text PdfPdf (462 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 904-911  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Michael Schnall-Levin  Artificial Intelligence Laboratory, MIT, Cambridge, MA
Leonid Chindelevitch  Artificial Intelligence Laboratory, MIT, Cambridge, MA
Bonnie Berger  Artificial Intelligence Laboratory, MIT, Cambridge, MA
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 57,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390270
What is a DOI?

ABSTRACT

Probabilistic grammatical formalisms such as hidden Markov models (HMMs) and stochastic context-free grammars (SCFGs) have been extensively studied and widely applied in a number of fields. Here, we introduce a new algorithmic problem on HMMs and SCFGs that arises naturally from protein and RNA design, and which has not been previously studied. The problem can be viewed as an inverse to the one solved by the Viterbi algorithm on HMMs or by the CKY algorithm on SCFGs. We study this problem theoretically and obtain the first algorithmic results. We prove that the problem is NP-complete, even for a 3-letter emission alphabet, via a reduction from 3-SAT, a result that has implications for the hardness of RNA secondary structure design. We then develop a number of approaches for making the problem tractable. In particular, for HMMs we develop a branch-and-bound algorithm, which can be shown to have fixed-parameter tractable worst-case running time, exponential in the number of states of the HMM but linear in the length of the structure. We also show how to cast the problem as a Mixed Integer Linear Program.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Andronescu, M. e. a. (2004). A new algorithm for RNA secondary structure design. Journal of Molecular Biology, 336, 607--624.
 
2
Breaker, R. R. (1996). Are engineered proteins getting competition from RNA? Current Opinion in Biotechnology, 7, 442--448.
 
3
 
4
Butterfoss, G., & Kuhlman, B. (2005). Computer-based design of novel protein structures. Annual Review of Biophysics and Biomolecular Structure, 35, 49--65.
 
5
Dowell, R., & Eddy, S. (2004). Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5.
 
6
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1999). Biological sequence analysis: Probablistic models of proteins and nucleic acids. Cambridge University Press.
 
7
Elizalde, S., & Woods, K. (2006). Bounds on the number of inference functions of a graphical model. ArXiv Mathematics e-prints, math/0610233.
 
8
Hofacker, I. L. e. a. (1994). Fast folding and comparison of RNA secondary structures. Monatshefte f. Chemie, 125, 167--188.
 
9
Park, S., Yang, X., & Saven, J. G. (2004). Advances in computational protein design. Current Opinion in Structural Biology, 14, 487--494.
 
10
Pokala, N., & M., H. T. (2001). Review: Protein design- where we were, where we are, where we're going. Journal of Structural Biology, 134, 269--281.
 
11
Viterbi, A. J. (1967). Error bounds for convolution codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260--269.
 
12
Zuker, M., & Stiegler, P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research, 9, 133--148.

Collaborative Colleagues:
Michael Schnall-Levin: colleagues
Leonid Chindelevitch: colleagues
Bonnie Berger: colleagues