| Mining significant graph patterns by leap search |
| Full text |
Pdf
(633 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
table of contents
Vancouver, Canada
SESSION: Research Session 10: Graphs I
table of contents
Pages 433-444
Year of Publication: 2008
ISBN:978-1-60558-102-6
|
|
Authors
|
|
Xifeng Yan
|
IBM T. J. Watson Research Center, Westchester, NY, USA
|
|
Hong Cheng
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
Jiawei Han
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
Philip S. Yu
|
University of Illinois at Chicago, Chicago, IL, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 40, Downloads (12 Months): 384, Citation Count: 5
|
|
|
ABSTRACT
With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In Proc. of 2005 European Symp. Principle of Data Mining and Knowledge Discovery, pages 46--58, 2005.
|
| |
4
|
C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.
|
| |
5
|
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for e®ective classification. In Proc. of ICDE, pages 716--725, 2007.
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
H. Fröhlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular graphs. In Proc. of ICDM, pages 225--232, 2005.
|
| |
10
|
M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. of ICDM, pages 153--162, 2007.
|
| |
11
|
|
| |
12
|
|
| |
13
|
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of SIGKDD, pages 263--266, 1996.
|
| |
14
|
B. Kelley, R. Sharan, R. Karp, E. Sittler, D. Root, B. Stockwell, and T. Ideker. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 100:11394--9, 2003.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
F. Pennerath and A. Napoli. Mining frequent most informative subgraphs. In the 5th Int. Workshop on Mining and Learning with Graphs, 2007.
|
| |
19
|
G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, MIT press, pages 229--248, 1991.
|
| |
20
|
|
| |
21
|
R. Sokal and F. Rohlf. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York, 1994.
|
 |
22
|
|
| |
23
|
|
| |
24
|
G. I. Webb. Opus: An efficient admissible algorithm for unordered search. J. of Artificial Intelligence Research, 3:431--465, 1995.
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
CITED BY 5
|
|
|
|
|
|
|
|
Wei Fan , Kun Zhang , Hong Cheng , Jing Gao , Xifeng Yan , Jiawei Han , Philip Yu , Olivier Verscheure, Direct mining of discriminative and essential frequent patterns via model-based search tree, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Hong Cheng , David Lo , Yang Zhou , Xiaoyin Wang , Xifeng Yan, Identifying bug signatures using discriminative graph mining, Proceedings of the eighteenth international symposium on Software testing and analysis, July 19-23, 2009, Chicago, IL, USA
|
|
|
David Lo , Hong Cheng , Jiawei Han , Siau-Cheng Khoo , Chengnian Sun, Classification of software behaviors for failure detection: a discriminative pattern mining approach, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|