|
ABSTRACT
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER). We conduct detailed experiments to test the performance and scalability of these methods. We find that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Serge Abiteboul , Haim Kaplan , Tova Milo, Compact labeling schemes for ancestor queries, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.547-556, January 07-09, 2001, Washington, D.C., United States
|
 |
2
|
|
| |
3
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
4
|
|
| |
5
|
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In 2nd SIAM Int'l Conference on Data Mining, April 2002.
|
| |
6
|
M.S. Chen, J.S. Park, and P.S. Yu. Data mining for path traversal patterns in a web environment. In International Conference on Distributed Computing Systems, 1996.
|
| |
7
|
Zhiyuan Chen , H. V. Jagadish , Flip Korn , Nick Koudas , S. Muthukrishnan , Raymond T. Ng , Divesh Srivastava, Counting Twig Matches in a Tree, Proceedings of the 17th International Conference on Data Engineering, p.595-604, April 02-06, 2001
|
| |
8
|
Richard Cole , Ramesh Hariharan , Piotr Indyk, Tree pattern matching and subset matching in deterministic O(n log3 n)-time, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.245-254, January 17-19, 1999, Baltimore, Maryland, United States
|
| |
9
|
D. Cook and L. Holder. Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.
|
| |
10
|
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
B. Shapiro and K. Zhang. Comparing multiple rna secondary strutures using tree comparisons. Computer Applications in Biosciences, 6(4):309--318, 1990.
|
 |
20
|
|
| |
21
|
|
| |
22
|
M. J. Zaki. Efficiently mining trees in a forest. Tech. Report 01--7, CS Dept., RPI, July 2001.
|
 |
23
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
CITED BY 76
|
|
|
|
|
|
|
|
|
|
|
Qiankun Zhao , Sourav S. Bhowmick , Mukesh Mohania , Yahiko Kambayashi, Discovering frequently changing structures from historical structural deltas of unordered XML, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chen Wang , Wei Wang , Jian Pei , Yongtai Zhu , Baile Shi, Scalable mining of large disk-based graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jun Huan , Wei Wang , Jan Prins , Jiong Yang, SPIN: mining maximal frequent subgraphs from graph databases, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
Liang Huai Yang , Mong Li Lee , Wynne Hsu , Xinyu Guo, 2PXMiner: an efficient two pass mining of frequent XML query patterns, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Jianyong Wang , Helen Pinto , Qiming Chen , Umeshwar Dayal , Mei-Chun Hsu, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach, IEEE Transactions on Knowledge and Data Engineering, v.16 n.11, p.1424-1440, November 2004
|
|
|
|
|
|
|
|
|
Satoshi Morinaga , Hiroki Arimura , Takahiro Ikeda , Yosuke Sakao , Susumu Akamine, Key semantics extraction by dependency tree mining, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Na Ta , Jianyong Wang , Jianhua Feng , Mohammed Zaki, Xproj: a framework for projected structural clustering of xml documents, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Liang Huai Yang , Mong Li Lee , Wynne Hsu , Decai Huang , Limsoon Wong, Efficient mining of frequent XML query patterns with repeating-siblings, Information and Software Technology, v.50 n.5, p.375-389, April, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yijun Bei , Gang Chen , Lidan Shou , Xiaoyan Li , Jinxiang Dong, Bottom-up discovery of frequent rooted unordered subtrees, Information Sciences: an International Journal, v.179 n.1-2, p.70-88, January, 2009
|
|
|
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dat P. T. Nguyen , Yutaka Matsuo , Mitsuru Ishizuka, Relation extraction from wikipedia using subtree mining, Proceedings of the 22nd national conference on Artificial intelligence, p.1414-1420, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|