ACM Home Page
Please provide us with feedback. Feedback
Mining frequent patterns without candidate generation
Full text PdfPdf (290 KB)
Source International Conference on Management of Data archive
Proceedings of the 2000 ACM SIGMOD international conference on Management of data table of contents
Dallas, Texas, United States
Pages: 1 - 12  
Year of Publication: 2000
ISBN:1-58113-217-4
Also published in ...
Authors
Jiawei Han  School of Computing Science, Simon Fraser University
Jian Pei  School of Computing Science, Simon Fraser University
Yiwen Yin  School of Computing Science, Simon Fraser University
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 58,   Downloads (12 Months): 496,   Citation Count: 390
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/342009.335372
What is a DOI?

ABSTRACT

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.

In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.
 
2
 
3
 
4
5
6
7
 
8
G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.
 
9
 
10
J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.
 
11
M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.
12
 
13
 
14
15
16
17
 
18
 
19
 
20
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.

CITED BY  390

Collaborative Colleagues:
Jiawei Han: colleagues
Jian Pei: colleagues
Yiwen Yin: colleagues