|
ABSTRACT
Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
C. A. James, D. Weininger, and J. Delany. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc, 2003.
|
| |
9
|
R. Kaushik P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. 2000 Int. Conf. Data Engineering ICDE'00), San Jose, CA, Feb. 2002.
|
| |
10
|
|
| |
11
|
T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 3-2:289--306, 1995.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
A. Shokoufandeh, S. J. Dickinson, K. Siddiqi, and S. W. Zucker. Indexing using a spectral encoding of topological structure. In Proc. IEEE Int'l Conf Computer Vision and Pattern Recognition (CVPR'99), Fort Collins, CO, Jun. 1999.
|
| |
16
|
S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), 2003.
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
CITED BY 29
|
|
Chen Wang , Wei Wang , Jian Pei , Yongtai Zhu , Baile Shi, Scalable mining of large disk-based graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
Jun Huan , Wei Wang , Jan Prins , Jiong Yang, SPIN: mining maximal frequent subgraphs from graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
Luciana S. Buriol , Gereon Frahling , Stefano Leonardi , Alberto Marchetti-Spaccamela , Christian Sohler, Counting triangles in data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
Bingjun Sun , Qingzhao Tan , Prasenjit Mitra , C. Lee Giles, Extraction and search of chemical formulae in text documents on the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Chen Chen , Cindy Xide Lin , Xifeng Yan , Jiawei Han, On effective presentation of graph patterns: a structural representative approach, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Hanghang Tong , Christos Faloutsos , Brian Gallagher , Tina Eliassi-Rad, Fast best-effort pattern matching in large attributed graphs, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Chen Chen , Xifeng Yan , Philip S. Yu , Jiawei Han , Dong-Qing Zhang , Xiaohui Gu, Towards graph containment search and indexing, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|