ACM Home Page
Please provide us with feedback. Feedback
Efficiently mining frequent trees in a forest
Full text PdfPdf (1.26 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Graphs and trees table of contents
Pages: 71 - 80  
Year of Publication: 2002
ISBN:1-58113-567-X
Author
Mohammed J. Zaki  Rensselaer Polytechnic Institute, Troy NY
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 135,   Citation Count: 76
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775058
What is a DOI?

ABSTRACT

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER). We conduct detailed experiments to test the performance and scalability of these methods. We find that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In 2nd SIAM Int'l Conference on Data Mining, April 2002.
 
6
M.S. Chen, J.S. Park, and P.S. Yu. Data mining for path traversal patterns in a web environment. In International Conference on Distributed Computing Systems, 1996.
 
7
 
8
 
9
D. Cook and L. Holder. Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.
 
10
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In 4th Intl. Conf. Knowledge Discovery and Data Mining, August 1998.
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
B. Shapiro and K. Zhang. Comparing multiple rna secondary strutures using tree comparisons. Computer Applications in Biosciences, 6(4):309--318, 1990.
20
 
21
 
22
M. J. Zaki. Efficiently mining trees in a forest. Tech. Report 01--7, CS Dept., RPI, July 2001.
23

CITED BY  76