|
ABSTRACT
This article studies the protein side-chain packing problem using the tree-decomposition of a protein structure. To obtain fast and accurate protein side-chain packing, protein structures are modeled using a geometric neighborhood graph, which can be easily decomposed into smaller blocks. Therefore, the side-chain assignment of the whole protein can be assembled from the assignment of the small blocks. Although we will show that the side-chain packing problem is still NP-hard, we can achieve a tree-decomposition-based globally optimal algorithm with time complexity of O(Nnrottw + 1) and several polynomial-time approximation schemes (PTAS), where N is the number of residues contained in the protein, nrot the average number of rotamers for each residue, and tw = O(N2/3 log N) the treewidth of the protein structure graph. Experimental results indicate that after Goldstein dead-end elimination is conducted, nrot is very small and tw is equal to 3 or 4 most of the time. Based on the globally optimal algorithm, we developed a protein side-chain assignment program TreePack, which runs up to 90 times faster than SCWRL 3.0, a widely-used side-chain packing program, on some large test proteins in the SCWRL benchmark database and an average of five times faster on all the test proteins in this database. There are also some real-world instances that TreePack can solve but that SCWRL 3.0 cannot. The TreePack program is available at http://ttic.uchicago.edu/~jinbo/TreePack.htm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Akutsu, T. 1997. NP-hardness results for protein side-chain packing. In Genome Informatics 8, S. Miyano and T. Takagi, Eds. 180--186.
|
| |
2
|
Alexandrov, N., Nussinov, R., and Zimmer, R. 1996. Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In Biocomputing: Proceedings of 1996 Pacific Symposium.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Bach, F., and Jordan, M. 2002. Thin junction trees. In Advances in Neural Information Processing Systems (NIPS), T. Dietterich, S. Becker, and Z. Ghahramani, Eds. Vol. 14. 569--574.
|
| |
7
|
Berry, A., Heggernes, P., and Simonet, G. 2003. The minimum degree heuristic and the minimal triangulation process. In Lecture Notes in Computer Science, vol. 2880. Springer-Verlag, New York, 58--70.
|
| |
8
|
Bower, M., Cohen, F., and Dunbrack, jr., R. L. 1997. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: A new homology modeling tool. J. Mol. Biol. 267, 1268--1282.
|
| |
9
|
Canutescu, A., Shelenkov, A., and Dunbrack, jr., R. L. 2003. A graph-theory algorithm for rapid protein side-chain prediction. Prot. Sci. 12, 2001--2014.
|
| |
10
|
Chazelle, B., Kingsford, C., and Singh, M. 2004. A semidefinite programming approach to side-chain positioning with new rounding strategies. INFORMS J. Comput. Special Issue in Computational Molecular Biology/Bioinformatics, 86--94.
|
| |
11
|
Desmet, J., Maeyer, M. D., Hazes, B., and Laster, I. 1992. The dead-end elimination theorem and its use in protein side-chain positioning. Nature 356, 539--542.
|
| |
12
|
Desmet, J., Spriet, J., and Laster, I. 2002. Fast and accurate side-chain topology and energy refinement (faster) as a new method for protein structure optimization. Protein: Struct. Funct. Gen. 48, 31--43.
|
| |
13
|
K. C. Dukka Bahadur , Tatsuya Akutsu , Etsuji Tomita , Tomokazu Seki, Protein side-chain packing problem: a maximum edge-weight clique algorithmic approach, Proceedings of the second conference on Asia-Pacific bioinformatics, p.191-200, January 01, 2004, Dunedin, New Zealand
|
| |
14
|
Dunbrack Jr., R. L. 1999. Comparative modeling of CASP3 targets using PSI-BLAST and SCWRL. Protein: Struct. Funct. Gen. 3, 81--87.
|
| |
15
|
Dunbrack Jr., R. L., and Cohen, F. 1997. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. 6, 1661--1681.
|
| |
16
|
|
| |
17
|
Goldstein, R. 1994. Efficient rotamer elimination applied to protein side-chains and related spin glasses. Biophys. J. 66, 1335--1340.
|
| |
18
|
Holm, L., and Sander, C. 1991. Database algorithm for generating protein backbone and sidechain coordinates from a ca trace: Application to model building and detection of coordinate errors. J. Mol. Biol. 218, 183--194.
|
| |
19
|
|
| |
20
|
Jones, D. 1999. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797--815.
|
| |
21
|
Kelley, L., MacCallum, R., and Sternberg, M. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 2, 499--520.
|
| |
22
|
|
| |
23
|
Kohlbacher, O., and Lenhof, H. 2000. BALL---Rapid software prototyping in computational molecular biology. Bioinformatics 16, 9, 815--824.
|
| |
24
|
Leach, A., and Lemon, A. 1998. Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Protein: Struct. Funct. Gen. 33, 227--239.
|
| |
25
|
Lee, C., and Subbiah, S. 1991. Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 217, 373--388.
|
| |
26
|
Li, W., Pio, F., Pawlowski, K., and Godzik, A. 2000. Saturated blast: Detecting distant homology using automated multiple intermediate sequence blast search. Bioinformatics 16, 1105--1110.
|
| |
27
|
Liang, S., and Grishin, N. 2002. side-chain modelling with an optimized scoring function. Protein Sci. 11, 322--331.
|
 |
28
|
|
| |
29
|
Moult, J., Fidelis, F., Zemla, A., and Hubbard, T. 2001. Critical assessment of methods on protein structure prediction (CASP)-round IV. Proteins: Struct. Funct. Gen. 45, S5 (Dec.), 2--7.
|
| |
30
|
Moult, J., Fidelis, F., Zemla, A., and Hubbard, T. 2003. Critical assessment of methods on protein structure prediction (CASP)-round V. Proteins: Struct., Funct. Gen. 53, S6 (Oct.), 334--339.
|
| |
31
|
Moult, J., Hubbard, T., Fidelis, F., and Pedersen, J. 1999. Critical assessment of methods on protein structure prediction (CASP)-round III. Proteins: Struct. Funct. Gen. 37, S3 (Dec.), 2--6.
|
| |
32
|
Mount, D., and Arya, S. 1997. ANN: A library for approximate nearest neighbor searching. In Proceedings of the 2nd CGC Workshop on Computational Geometry.
|
| |
33
|
Pierce, N., and Winfree, E. 2002. Protein design is NP-hard. Protein Engi. 15, 10, 779--782.
|
| |
34
|
Robertson, N., and Seymour, P. 1986. Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7, 309--322.
|
| |
35
|
Sali, A., and Blundell, T. 1993. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol., 779--815.
|
| |
36
|
Samudrala, R., and Moult, J. 1998. Determinants of side chain conformational preferences in protein structures. Protein Engi. 11, 991--997.
|
| |
37
|
Shi, J., Tom, L. B., and Kenji, M. 2001. FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243--257.
|
| |
38
|
Summers, N., and Karplus, M. 1989. Construction of side-chains in homology modelling: Application to the c-terminal lobe of rhizopuspepsin. J. Mol. Biol. 210, 785--811.
|
| |
39
|
|
| |
40
|
Xiang, Z., and Honig, B. 2001. Extending the accuracy limits of prediction for side-chain conformations. J. Mol. Biol. 311, 421--430.
|
| |
41
|
Xu, J., Li, M., Kim, D., and Xu, Y. 2003a. RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1, 1, 95--117.
|
| |
42
|
Xu, J., Li, M., Lin, G., Kim, D., and Xu, Y. 2003b. Protein threading by linear programming. In Biocomputing: Proceedings of the 2003 Pacific Symposium. Hawaii, USA, 264--275.
|
| |
43
|
Xu, Y., Xu, D., and Uberbacher, E. 1998. An efficient computational method for globally optimal threadings. J. Comput. Biol. 5, 3, 597--614.
|
|