|
ABSTRACT
To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.
|
| |
8
|
|
| |
9
|
|
 |
10
|
Jun Huan , Wei Wang , Jan Prins , Jiong Yang, SPIN: mining maximal frequent subgraphs from graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014123]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006.
|
| |
15
|
|
| |
16
|
J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.
|
| |
17
|
|
 |
18
|
Jimeng Sun , Christos Faloutsos , Spiros Papadimitriou , Philip S. Yu, GraphScope: parameter-free mining of large time-evolving graphs, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281266]
|
| |
19
|
|
| |
20
|
R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.
|
 |
21
|
|
 |
22
|
Chen Wang , Wei Wang , Jian Pei , Yongtai Zhu , Baile Shi, Scalable mining of large disk-based graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014088]
|
| |
23
|
|
| |
24
|
|
| |
25
|
D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
Zhiping Zeng , Jianyong Wang , Lizhu Zhou , George Karypis, Coherent closed quasi-clique discovery from large dense graph databases, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150506]
|
| |
32
|
S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.
|
| |
33
|
|
|