ACM Home Page
Please provide us with feedback. Feedback
Structure-based querying of proteins using wavelets
Full text PdfPdf (271 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 15th ACM international conference on Information and knowledge management table of contents
Arlington, Virginia, USA
SESSION: Similarity and matching table of contents
Pages: 24 - 33  
Year of Publication: 2006
ISBN:1-59593-433-2
Authors
Keith Marsolo  The Ohio State University
Srinivasan Parthasarathy  The Ohio State University
Kotagiri Ramamohanarao  University of Melbourne
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 55,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183614.1183622
What is a DOI?

ABSTRACT

The ability to retrieve molecules based on structural similarity has use in many applications, from disease diagnosis and treatment to drug discovery and design. In this paper, we present a method to represent protein molecules that allows for the fast, flexible and efficient retrieval of similar structures, based on either global or local attributes. We begin by computing the pair-wise distance between amino acids, transforming each 3D structure into a 2D distance matrix. We normalize this matrix to a specific size and apply a 2D wavelet decomposition to generate a set of approximation coefficients, which serves as our global feature vector. This transformation reduces the overall dimensionality of the data while still preserving spatial features and correlations. We test our method by running queries on three different protein data sets that have been used previously in the literature, basing our comparisons on labels taken from the SCOP database. We find that our method significantly outperforms existing approaches, in terms of retrieval accuracy, memory utilization and execution time. Specifically, using a k-d tree and running a 10-nearest-neighbor search on a dataset of 33,000 proteins against itself, we see an average accuracy of 89% at the SCOP SuperFamily level and a total query time that is up to 350 times faster than previously published techniques. In addition to processing queries based on global similarity, we also propose innovative extensions to effectively match proteins based solely on shared local substructures, allowing for a more flexible query interface.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. of Mo. Biol., 215:403--410, 1990.
 
3
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Anang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25:3389--3402, 1997.
 
4
5
 
6
A. Bhattacharya, T. Can, T. Kahveci, A. Singh, and Y. Wang. ProGreSS: Simultaneous searching of protein databases by sequence and structure. In Pacific Symposium on Biocomputing, volume 9, pages 264--275. World Scientific Press, 2004.
 
7
 
8
 
9
S. M. Larson, C. D. Snow, M. Shirts, and V. S. Pande. Foldingυhome and genomeυhome: Using distributed computing to tackle previously intractable problems in computational biology. In R. Grant, editor, Computational Genomics. Horizon Press, 2002.
 
10
S. Mallat. A Wavelet Tour of Signal Processing. Academic, New York, 2nd edition, 1999.
 
11
 
12
S. Mehta, S. Barr, A. Choy, H. Yang, S. Parthasarathy, R. Machiraju, and J. Wilkins. Dynamic classification of anomalous structures in molecular dynamics simulation data. In Proceedings of the SIAM Conference on Data Mining, 2005.
 
13
A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. Scop: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol, 247:536--540, 1995.
14
 
15

Collaborative Colleagues:
Keith Marsolo: colleagues
Srinivasan Parthasarathy: colleagues
Kotagiri Ramamohanarao: colleagues