ACM Home Page
Please provide us with feedback. Feedback
SPIN: mining maximal frequent subgraphs from graph databases
Full text PdfPdf (191 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 581 - 586  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Jun Huan  University of North Carolina at Chapel Hill, Chapel Hill, NC
Wei Wang  University of North Carolina at Chapel Hill, Chapel Hill, NC
Jan Prins  University of North Carolina at Chapel Hill, Chapel Hill, NC
Jiong Yang  University of Illinois, Urbana-Champaign, IL
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 97,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014123
What is a DOI?

ABSTRACT

One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude.Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a five-fold speed up over the current state-of-the-art subgraph mining algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
 
6
 
7
J. Hu, X. Shen, Y. Shao, C. Bystroff, and M. J. Zaki. Mining protein contact maps. 2nd BIOKDD Workshop on Data Mining in Bioinformatics, 2002.
8
 
9
 
10
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: Mining maximal frequent subgraphs from graph databases. UNC Technical Report TR04-018, 2004.
 
11
J. Huan, W. Wang, A. Washington, J. Prins, and A. Tropsha. Accurately classify protein family based on coherrent subgraph mining. in Pacific Symposium on Biocomputing, 2004.
 
12
 
13
 
14
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. SDM, 2004.
 
15
 
16
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, 2003.
 
17
 
18
 
19
20
21
22

CITED BY  13

Collaborative Colleagues:
Jun Huan: colleagues
Wei Wang: colleagues
Jan Prins: colleagues
Jiong Yang: colleagues