| Out-of-core frequent pattern mining on a commodity PC |
| Full text |
Pdf
(832 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
SESSION: Research track papers
table of contents
Pages: 86 - 95
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 92, Citation Count: 3
|
|
|
ABSTRACT
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
 |
3
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
8
|
B. Goethals and M. Zaki. Advances in frequent itemset mining implementations. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.
|
| |
9
|
|
| |
10
|
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
|
| |
11
|
|
 |
12
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
13
|
G. Liu, H. Lu, J. X. Yu, W. Wei, and X. Xiao. Afopt: An efficient implementation of pattern growth approach. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.
|
| |
14
|
|
 |
15
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
16
|
S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Memory placement techniques for parallel association mining. International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1998.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery discovery of association rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1995.
|
| |
21
|
|
CITED BY 3
|
|
Gregory Buehrer , Srinivasan Parthasarathy , Shirish Tatikonda , Tahsin Kurc , Joel Saltz, Toward terabyte pattern mining: an architecture-conscious solution, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|