| SPIN: mining maximal frequent subgraphs from graph databases |
| Full text |
Pdf
(191 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters
table of contents
Pages: 581 - 586
Year of Publication: 2004
ISBN:1-58113-888-1
|
|
Authors
|
|
Jun Huan
|
University of North Carolina at Chapel Hill, Chapel Hill, NC
|
|
Wei Wang
|
University of North Carolina at Chapel Hill, Chapel Hill, NC
|
|
Jan Prins
|
University of North Carolina at Chapel Hill, Chapel Hill, NC
|
|
Jiong Yang
|
University of Illinois, Urbana-Champaign, IL
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 97, Citation Count: 13
|
|
|
ABSTRACT
One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude.Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a five-fold speed up over the current state-of-the-art subgraph mining algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Alin Deutsch , Mary Fernandez , Dan Suciu, Storing semistructured data with STORED, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.431-442, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
6
|
|
| |
7
|
J. Hu, X. Shen, Y. Shao, C. Bystroff, and M. J. Zaki. Mining protein contact maps. 2nd BIOKDD Workshop on Data Mining in Bioinformatics, 2002.
|
 |
8
|
Jun Huan , Wei Wang , Deepak Bandyopadhyay , Jack Snoeyink , Jan Prins , Alexander Tropsha, Mining protein family specific residue packing patterns from protein structure graphs, Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, p.308-315, March 27-31, 2004, San Diego, California, USA
[doi> 10.1145/974614.974655]
|
| |
9
|
|
| |
10
|
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: Mining maximal frequent subgraphs from graph databases. UNC Technical Report TR04-018, 2004.
|
| |
11
|
J. Huan, W. Wang, A. Washington, J. Prins, and A. Tropsha. Accurately classify protein family based on coherrent subgraph mining. in Pacific Symposium on Biocomputing, 2004.
|
| |
12
|
|
| |
13
|
|
| |
14
|
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. SDM, 2004.
|
| |
15
|
|
| |
16
|
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, 2003.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
CITED BY 13
|
|
|
|
|
R. Jin , C. Wang , D. Polshakov , S. Parthasarathy , G. Agrawal, Discovering frequent topological structures from graph datasets, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Jin Chen , Wynne Hsu , Mong Li Lee , See-Kiong Ng, NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Chen Chen , Cindy Xide Lin , Xifeng Yan , Jiawei Han, On effective presentation of graph patterns: a structural representative approach, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|