ACM Home Page
Please provide us with feedback. Feedback
Graph indexing: a frequent structure-based approach
Full text PdfPdf (275 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: indexing and tuning table of contents
Pages: 335 - 346  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Xifeng Yan  University of Illinois at Urbana-Champaign
Philip S. Yu  IBM T. J. Watson Research Center
Jiawei Han  University of Illinois at Urbana-Champaign
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 44,   Downloads (12 Months): 224,   Citation Count: 29
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007607
What is a DOI?

ABSTRACT

Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
 
6
 
7
 
8
C. A. James, D. Weininger, and J. Delany. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc, 2003.
 
9
R. Kaushik P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. 2000 Int. Conf. Data Engineering ICDE'00), San Jose, CA, Feb. 2002.
 
10
 
11
T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 3-2:289--306, 1995.
 
12
 
13
14
 
15
A. Shokoufandeh, S. J. Dickinson, K. Siddiqi, and S. W. Zucker. Indexing using a spectral encoding of topological structure. In Proc. IEEE Int'l Conf Computer Vision and Pattern Recognition (CVPR'99), Fort Collins, CO, Jun. 1999.
 
16
S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), 2003.
 
17
18
 
19
 
20
21
22

CITED BY  29
Collaborative Colleagues:
Xifeng Yan: colleagues
Philip S. Yu: colleagues
Jiawei Han: colleagues