|
ABSTRACT
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of protein three-dimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almost-Delaunay tessellation approach.Applying a frequent subgraph mining algorithm to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almost-Delaunay and Delaunay tessellations are sparser than the contact distance graphs; yet they are robust and efficient for mining protein spatial motif.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In 4th International Conference on Knowledge Discovery and Data Mining, pages 30--36, 1998.
|
| |
7
|
B. Delaunay. Sur la sphère vide. A la memoire de Georges Voronoi. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskih i Estestvennyh Nauk, 7:793--800, 1934.
|
 |
8
|
P. W. Finn , L. E. Kavraki , J.-C. Latombe , R. Motwani , C. Shelton , S. Venkatasubramanian , A. Yao, RAPID: randomized pharmacophore identification for drug design, Proceedings of the thirteenth annual symposium on Computational geometry, p.324-333, June 04-06, 1997, Nice, France
[doi> 10.1145/262839.262993]
|
| |
9
|
D. Fischer, H. Wolfson, S. L. Lin, and R. Nussinov. Three-dimensional, sequence order-independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: potential implication to evolution and to protein folding. Protein Science, 3:769--778, 1994.
|
| |
10
|
H. Grindley, P. Artymiuk, D. Rice, and P. Willet. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. J. Mol. biol., 229:707--721, 1993.
|
| |
11
|
|
| |
12
|
S. K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch. The prosite database, its status in 1999. Nucleic Acids Res, 27(1):215--219, 1999.
|
| |
13
|
|
| |
14
|
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. UNC computer science technical report TR03-021, 2003.
|
| |
15
|
J. Huan, W. Wang, A. Washington, J. Prins, R. Shah, and A. Tropsha. Accurate classification of protein structural families based on coherent subgraph analysis. In Proc. Pacific Symposium on Biocomputing, 2004.
|
| |
16
|
Piotr Indyk , Rajeev Motwani , Suresh Venkatasubramanian, Geometric matching under noise: combinatorial bounds and algorithms, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.457-465, January 17-19, 1999, Baltimore, Maryland, United States
|
| |
17
|
|
| |
18
|
D. Jacobs, A. Rader, L. Kuhn, and M. Thorpe. Graph theory predictions of protein flexibility. Proteins: Struct. Funct. Genet., 44:150--155, 2001.
|
| |
19
|
|
| |
20
|
G. J. Kleywegt. Recognition of spatial motifs in protein structures. Journal of Molecular Biology, 285(4):1887--1897, 1999.
|
| |
21
|
|
| |
22
|
J. Liang, H. Edelsbrunner, P. Fu, P. Sudhakar, and S. Subramaniam. Analytical shape computing of macromolecules I: molecular area and volume through alpha shape. Proteins, 33:1--17, 1998.
|
| |
23
|
A. Murzin, S. Brenner, T. Hubbard, and C. Chothia. Scop: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247:536--40, 1995.
|
| |
24
|
F. M. Richards. The interpretation of protein structures: total volume, group volume distributions, and packing density. J. Molecular Biology, 82:1--14, 1974.
|
| |
25
|
B. Shapiro. An algorithm for comparing multiple RNA secondary structures. Comp. Appl. Biosci, 4(3):387--393, 1988.
|
| |
26
|
R. Singh, A. Tropsha, and I. Vaisman. Delaunay tessellation of proteins. J. Comput. Biol., 3:213--222, 1996.
|
 |
27
|
|
| |
28
|
A. Tropsha, C. Carter, S. Cammer, and I. Vaisman. Simplicial neighborhood analysis of protein packing (SNAPP) : a computational geometry approach to studying proteins. Methods Enzymol., 374:509--544, 2003.
|
| |
29
|
J. Tsai, R. Taylor, C. Chothia, and M. Gerstein. The packing density in proteins: Standard radii and volumes. Journal of Molecular Biology, 290(1):253--266, 1999.
|
| |
30
|
|
| |
31
|
S. Vishveshwara, K. V. Brinda, and N. Kannan. Protein structure: Insights from graph theory. J. of Theo. and Comp. Chem., 1(1):187--211, 2002.
|
| |
32
|
H. Wako and T. Yamato. Novel method to detect a motif of local structures in different protein conformations. Protein Engineering, 11:981--990, 1998.
|
| |
33
|
|
| |
34
|
L. Wernisch, M. Hunting, and S. Wodak. Identification of structural domains in proteins by a graph heuristic. Proteins, 35(3):338--352, 1999.
|
| |
35
|
|
| |
36
|
|
CITED BY 9
|
|
Jun Huan , Wei Wang , Jan Prins , Jiong Yang, SPIN: mining maximal frequent subgraphs from graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
Nikhil S. Ketkar , Lawrence B. Holder , Diane J. Cook, gRegress: extracting features from graph transactions for regression, Proceedings of the 21st international jont conference on Artifical intelligence, p.1089-1094, July 11-17, 2009, Pasadena, California, USA
|
|
|
|
|