|
ABSTRACT
Graphs have become increasingly important in modelling complicated structures and schemaless data such as chemical compounds, proteins, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via indices. In this article, we investigate the issues of indexing graphs and propose a novel indexing model based on discriminative frequent structures that are identified through a graph mining process. We show that the compact index built under this model can achieve better performance in processing graph queries. Since discriminative frequent structures capture the intrinsic characteristics of the data, they are relatively stable to database updates, thus facilitating sampling-based feature extraction and incremental index maintenance. Our approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be generalized and applied to indexing sequences, trees, and other complicated structures as well.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Alon, N. and Spencer, J. 1992. The Probabilistic Method. Wiley, New York, NY.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the subdue system. In Proceedings of the AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94) (Seattle, WA). 169--180.
|
| |
11
|
|
| |
12
|
James, C., Weininger, D., and Delany, J. 2003. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc.
|
| |
13
|
Kaushik, R., Shenoy, P., Bohannon, P., and Gudes, E. 2002. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proceedings of the 2000 International Conference on Data Engineering (ICDE'02) (San Jose, CA). 129--140.
|
| |
14
|
|
| |
15
|
Kuramochi, M. and Karypis, G. 2004. Finding frequent patterns in a large sparse graph. In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM'04) (Orlando, FL).
|
| |
16
|
Madej, T., Gibrat, J., and Bryant, S. 1995. Threading a database of protein cores. Proteins 3-2, 289--306.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
Shokoufandeh, A., Dickinson, S., Siddiqi, K., and Zucker, S. 1999. Indexing using a spectral encoding of topological structure. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'99) (Fort Collins, CO). IEEE Computer Society Press, Los Alamitos, CA, 2491--2497.
|
| |
22
|
Srinivasa, S. and Kumar, S. 2003. A platform based on the multidimensional data model for analysis of bio-molecular structures. In Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB'03) (Berlin, Germany). 975--986.
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
CITED BY 6
|
|
Liping Wang , Qing Li , Na Li , Guozhu Dong , Yu Yang, Substructure similarity measurement in chinese recipes, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|