ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Mining protein family specific residue packing patterns from protein structure graphs
Full text PdfPdf (186 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the eighth annual international conference on Resaerch in computational molecular biology table of contents
San Diego, California, USA
Pages: 308 - 315  
Year of Publication: 2004
ISBN:1-58113-755-9
Authors
Jun Huan  University of North Carolina at Chapel Hill
Wei Wang  University of North Carolina at Chapel Hill
Deepak Bandyopadhyay  University of North Carolina at Chapel Hill
Jack Snoeyink  University of North Carolina at Chapel Hill
Jan Prins  University of North Carolina at Chapel Hill
Alexander Tropsha  University of North Carolina at Chapel Hill
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 44,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/974614.974655
What is a DOI?

ABSTRACT

Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of protein three-dimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almost-Delaunay tessellation approach.Applying a frequent subgraph mining algorithm to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almost-Delaunay and Delaunay tessellations are sparser than the contact distance graphs; yet they are robust and efficient for mining protein spatial motif.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
 
6
L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In 4th International Conference on Knowledge Discovery and Data Mining, pages 30--36, 1998.
 
7
B. Delaunay. Sur la sphère vide. A la memoire de Georges Voronoi. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskih i Estestvennyh Nauk, 7:793--800, 1934.
8
 
9
D. Fischer, H. Wolfson, S. L. Lin, and R. Nussinov. Three-dimensional, sequence order-independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: potential implication to evolution and to protein folding. Protein Science, 3:769--778, 1994.
 
10
H. Grindley, P. Artymiuk, D. Rice, and P. Willet. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. J. Mol. biol., 229:707--721, 1993.
 
11
 
12
S. K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch. The prosite database, its status in 1999. Nucleic Acids Res, 27(1):215--219, 1999.
 
13
 
14
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. UNC computer science technical report TR03-021, 2003.
 
15
J. Huan, W. Wang, A. Washington, J. Prins, R. Shah, and A. Tropsha. Accurate classification of protein structural families based on coherent subgraph analysis. In Proc. Pacific Symposium on Biocomputing, 2004.
 
16
 
17
 
18
D. Jacobs, A. Rader, L. Kuhn, and M. Thorpe. Graph theory predictions of protein flexibility. Proteins: Struct. Funct. Genet., 44:150--155, 2001.
 
19
 
20
G. J. Kleywegt. Recognition of spatial motifs in protein structures. Journal of Molecular Biology, 285(4):1887--1897, 1999.
 
21
 
22
J. Liang, H. Edelsbrunner, P. Fu, P. Sudhakar, and S. Subramaniam. Analytical shape computing of macromolecules I: molecular area and volume through alpha shape. Proteins, 33:1--17, 1998.
 
23
A. Murzin, S. Brenner, T. Hubbard, and C. Chothia. Scop: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247:536--40, 1995.
 
24
F. M. Richards. The interpretation of protein structures: total volume, group volume distributions, and packing density. J. Molecular Biology, 82:1--14, 1974.
 
25
B. Shapiro. An algorithm for comparing multiple RNA secondary structures. Comp. Appl. Biosci, 4(3):387--393, 1988.
 
26
R. Singh, A. Tropsha, and I. Vaisman. Delaunay tessellation of proteins. J. Comput. Biol., 3:213--222, 1996.
27
 
28
A. Tropsha, C. Carter, S. Cammer, and I. Vaisman. Simplicial neighborhood analysis of protein packing (SNAPP) : a computational geometry approach to studying proteins. Methods Enzymol., 374:509--544, 2003.
 
29
J. Tsai, R. Taylor, C. Chothia, and M. Gerstein. The packing density in proteins: Standard radii and volumes. Journal of Molecular Biology, 290(1):253--266, 1999.
 
30
 
31
S. Vishveshwara, K. V. Brinda, and N. Kannan. Protein structure: Insights from graph theory. J. of Theo. and Comp. Chem., 1(1):187--211, 2002.
 
32
H. Wako and T. Yamato. Novel method to detect a motif of local structures in different protein conformations. Protein Engineering, 11:981--990, 1998.
 
33
 
34
L. Wernisch, M. Hunting, and S. Wodak. Identification of structural domains in proteins by a graph heuristic. Proteins, 35(3):338--352, 1999.
 
35
 
36

CITED BY  9

Collaborative Colleagues:
Jun Huan: colleagues
Wei Wang: colleagues
Deepak Bandyopadhyay: colleagues
Jack Snoeyink: colleagues
Jan Prins: colleagues
Alexander Tropsha: colleagues