ACM Home Page
Please provide us with feedback. Feedback
101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem
Full text PdfPdf (388 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fifth annual international conference on Computational biology table of contents
Montreal, Quebec, Canada
Pages: 193 - 202  
Year of Publication: 2001
ISBN:1-58113-353-7
Authors
Giuseppe Lancia  Celera Genomics, Rockville, MD and D.E.I., University of Padova
Robert Carr  Sandia National Labs, Albuquerque, NM
Brian Walenz  Celera Genomics, Rockville, MD
Sorin Istrail  Celera Genomics, Rockville, MD
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 28,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/369133.369199
What is a DOI?

ABSTRACT

Structure comparison is a fundamental problem for structural genomics. A variety of structure comparison methods were proposed and several protein structure classification servers e.g., SCOP, DALI, CATH, were designed based on them, and are extensively used in practice. This area of research continues to be very active, being energized bi-annually by the CASP folding competitions, but despite the extraordinary international research effort devoted to it, progress is slow. A fundamental dimension of this bottleneck is the absence of rigorous algorithmic methods. A recent excellent survey on structure comparison by Taylor et.al. [23] records the state of the art of the area: In structure comparison, we do not even have an algorithm that guarantees an optimal answer for pairs of structures …

In this paper we provide the first rigorous algorithm for structure comparison. Our method is based on developing an effective integer linear programming (IP) formulation of protein structure contact maps overlap (CMO), and a branch-and-cut strategy that employs lower-bounding heuristics at the branch nodes. Our algorithms identified a gallery of optimal and near-optimal structure alignments for pairs of proteins from the Protein Data Bank with up to 80 amino acids and about 150 contacts each — problems of instance size of about 300. Although these sizes also reflect our current limitations, these are the first provable optimal and near-optimal algorithms in the literature for a measure of structure similarity which sees extensive practical use. At the heart of our success in finding optimal alignments is a reduction of the CMO optimization to the maximum independent set (MIS) problem on special graphs. For CMO instances of size 300, the corresponding MIS graph instance contains about 10,000 nodes. While our algorithms are able to solve to optimality MIS problem of these sizes, the known optimal algorithms for the MIS on general graphs can at present only solve instances with up to a few hundred nodes. This is the first effective use of IP methods in protein structure comparison; the biomolecular structure literature contains only one other effective IP method devoted to RNA comparison, due to Lenhof et.al. [18].

The hybrid heuristic approach that worked well for providing lower bounds in the branch and cut algorithm was tried on large proteins in a test set suggested by Jeffrey Skolnick. It involved 33 proteins classified into four families: Flavodoxin-like fold CheY-related, Plastocyanin, TIM Barrel, and Ferratin. Out of the set of all 528 pairwise structure alignments, we have validated the clustering with a 98.7% accuracy (1.3% false negatives and 0% false positives).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank, Nucleic Acids Research, 28 pp. 235-242, 2000.
 
3
 
4
P. Crescenzi and V. Kann, A compendium of NP optimization problems, http ://www. nada. kth. se/~viggo, the web.
 
5
M. Grotschel, L. Lov~sz and A. Schrijver, "The Ellipsoid Method and its Consequences in Combinatorial Optimization", Combinatorica 1 (1981), 169-197.
 
6
A. Godzik, The structural alignment between two proteins: Is there a unique answer ?, Protein Science, 5:1325-1338, 1996.
 
7
A. Godzik, J. Sklonick and A. Kolinski, A topology fingerprint approach to inverse protein folding problem, J. Mol. Bio1.,227:227-238, 1992.
 
8
A. Godzik and J. Skolnick, Flexible algorithm for direct multiple alignment of protein structures and sequences, CABIOS, 10, (6) 587-596, 1994.
 
9
 
10
 
11
D. Goldman, PhD. Thesis, Dept. of Computer Science, U C Berkeley, 2000.
 
12
A. Lucas, K. Dill and S. Istrail, Contact maps and the computational statistical mechanics aspects of protein folding (in preparation).
 
13
R. B. Hayward, Wealky Triangulated Graphs, J. of Comb. Theory, Series B, (39)200-209, 1985.
 
14
R.B. Hayward, C. Hoang and F. Maffray, Optimizing Wealky Triangulated Graphs, Graphs and Combinatorics, 1987.
 
15
L. Holm and C. Sander, 3-D lookup: fast protein structure searches at 90% reliability, Proceedings of the ISMB 1995, p. 179-187, AAAI., 1995.
 
16
 
17
Kabash-W., A solution for the best rotation to relate two sets of vectors, Acta Cryst. A32, 922-923, 1978.
 
18
H. P. Lenhof, K. Reinert, M. Vingron, A Polyhedral Approach to RNA Sequence Structure Alignment, J. Comp. Biol., 5(3):517-530, 1998.
 
19
A. Lesk, 11th Lipari International Summer School in Computational Biology, 1999.
 
20
 
21
G. L. Nemhauser and L. E. Trotter, Vertex packings: Structural properties and algorithms, Mathematical Programming, 8:232-248, 1975.
 
22
 
23
I. Eidhammer and I. Jonassen and W. R. Taylor, Structure Comparison and Structure Prediction, to appear J. Comp. Biol., x(x), 2000.
 
24
 
25

CITED BY  9

Collaborative Colleagues:
Giuseppe Lancia: colleagues
Robert Carr: colleagues
Brian Walenz: colleagues
Sorin Istrail: colleagues