|
ABSTRACT
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.
In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
 |
7
|
|
| |
8
|
G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.
|
| |
9
|
|
| |
10
|
J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.
|
| |
11
|
M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.
|
 |
12
|
Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo, Finding interesting rules from large sets of discovered association rules, Proceedings of the third international conference on Information and knowledge management, p.401-407, November 29-December 02, 1994, Gaithersburg, Maryland, United States
[doi> 10.1145/191246.191314]
|
| |
13
|
|
| |
14
|
|
 |
15
|
Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang, Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.13-24, June 01-04, 1998, Seattle, Washington, United States
|
 |
16
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
 |
17
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
18
|
|
| |
19
|
|
| |
20
|
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.
|
CITED BY 390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
Dimitris Meretakis , Dimitris Fragoudis , Hongjun Lu , Spiros Likothanassis, Scalable association-based text classification, Proceedings of the ninth international conference on Information and knowledge management, p.5-11, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Jiawei Han , Jian Pei , Behzad Mortazavi-Asl , Qiming Chen , Umeshwar Dayal , Mei-Chun Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.355-359, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dimitrios Gunopulos , Roni Khardon , Heikki Mannila , Sanjeev Saluja , Hannu Toivonen , Ram Sewak Sharma, Discovering all most specific sentences, ACM Transactions on Database Systems (TODS), v.28 n.2, p.140-174, June 2003
|
|
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
Guimei Liu , Hongjun Lu , Wenwu Lou , Jeffrey Xu Yu, On computing, storing and querying frequent patterns, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiankun Zhao , Sourav S. Bhowmick , Mukesh Mohania , Yahiko Kambayashi, Discovering frequently changing structures from historical structural deltas of unordered XML, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jacky W. W. Wan , Gillian Dobbie, Mining association rules from XML data using XQuery, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.169-174, January 01, 2004, Dunedin, New Zealand
|
|
|
Masum Serazi , Vasily Malakhov , Dongmei Ren , Amal Perera , Imad Rahal , Weihua Wu , Qiang Ding , Fei Pan , William Perrizo, DataMIME™, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
Guozhu Dong , Jiawei Han , Joyce M. W. Lam , Jian Pei , Ke Wang , Wei Zou, Mining Constrained Gradients in Large Databases, IEEE Transactions on Knowledge and Data Engineering, v.16 n.8, p.922-938, August 2004
|
|
|
|
|
|
Hui Xiong , Shashi Shekhar , Pang-Ning Tan , Vipin Kumar, Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
Y. Dora Cai , David Clutter , Greg Pape , Jiawei Han , Michael Welge , Loretta Auvil, MAIDS: mining alarming incidents from data streams, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hervé Brönnimann , Bin Chen , Manoranjan Dash , Peter Haas , Peter Scheuermann, Efficient data reduction with EASE, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haiquan Li , Jinyan Li , Limsoon Wong , Mengling Feng , Yap-Peng Tan, Relative risk and odds ratio: a data mining perspective, Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 13-15, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongxia Xia , Qi Shen , Luo Zhong , Shan Feng , Rui Hao, Application of data mining technology and generic algorithm to intrusion detection system, Proceedings of the 3rd international conference on Information security, November 14-16, 2004, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Jianyong Wang , Helen Pinto , Qiming Chen , Umeshwar Dayal , Mei-Chun Hsu, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach, IEEE Transactions on Knowledge and Data Engineering, v.16 n.11, p.1424-1440, November 2004
|
|
|
|
|
|
|
|
|
Imad Rahal , Dongmei Ren , Amal Perera , Hassan Najadat , William Perrizo , Riad Rahhal , Willy Valdivia, Incremental interactive mining of constrained association rules from biological annotation data with nominal features, Proceedings of the 2005 ACM symposium on Applied computing, March 13-17, 2005, Santa Fe, New Mexico
|
|
|
|
|
|
Xuan-Hieu Phan , Le-Minh Nguyen , Tu-Bao Ho , Susumu Horiguchi, Improving discriminative sequential learning with rare--but--important associations, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
Shengnan Cong , Jiawei Han , Jay Hoeflinger , David Padua, A sampling-based framework for parallel data mining, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cheng-Ru Lin , Chang-Hung Lee , Ming-Syan Chen , Philip S. Yu, Distributed data mining in a chain store database of short transactions, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony J. T. Lee , Chun-Sheng Wang , Wan-Yu Weng , Yi-An Chen , Huei-Wen Wu, An efficient algorithm for mining closed inter-transaction itemsets, Data & Knowledge Engineering, v.66 n.1, p.68-91, July, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gregory Buehrer , Yen-Kuang Chen , Srinivasan Parthasarathy , Anthony Nguyen , Amol Ghoting , Daehyun Kim, Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures, Proceedings of the 2006 workshop on Memory system performance and correctness, October 22-22, 2006, San Jose, California
|
|
|
Gregory Buehrer , Srinivasan Parthasarathy , Shirish Tatikonda , Tahsin Kurc , Joel Saltz, Toward terabyte pattern mining: an architecture-conscious solution, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Manolis Terrovitis , Spyros Passas , Panos Vassiliadis , Timos Sellis, A combination of trie-trees and inverted files for the indexing of set-valued attributes, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Takeaki Uno , Masashi Kiyomi , Hiroki Arimura, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, p.77-86, August 21-21, 2005, Chicago, Illinois
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi-Hong Chu , Jen-Wei Huang , Kun-Ta Chuang , Ming-Syan Chen, On subspace clustering with density consciousness, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yaochun Huang , Hui Xiong , Weili Wu , Ping Deng , Zhongnan Zhang, Mining maximal hyperclique pattern: A hybrid search strategy, Information Sciences: an International Journal, v.177 n.3, p.703-721, February, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shintaro Urabe , Jiahong Wong , Eiichiro Kodama , Toyoo Takata, A high collusion-resistant approach to distributed privacy-preserving data mining, Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks, p.326-331, February 13-15, 2007, Innsbruck, Austria
|
|
|
|
|
|
|
|
|
Dong Xin , Jiawei Han , Xiaolei Li , Benjamin W. Wah, Star-cubing: computing iceberg cubes by top-down and bottom-up integration, Proceedings of the 29th international conference on Very large data bases, p.476-487, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on modern and emerging processors, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.77-96, January 2007
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, A characterization of data mining algorithms on a modern processor, Proceedings of the 1st international workshop on Data management on new hardware, June 12-12, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Avaré Stewart , Ling Chen , Raluca Paiu , Wolfgang Nejdl, Discovering information diffusion paths from blogosphere for online advertising, Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, p.46-54, August 12-12, 2007, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yubao Liu , Jiarong Cai , Zhilan Huang , Jingwen Yu , Jian Yin, Fast detection of database system abuse behaviors based on data mining approach, Proceedings of the 2nd international conference on Scalable information systems, June 06-08, 2007, Suzhou, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yanfang Ye , Dingding Wang , Tao Li , Dongyi Ye, IMDS: intelligent malware detection system, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Surong Wang , Manoranjan Dash , Liang-Tien Chia , Min Xu, Efficient sampling of training set in large and noisy multimedia data, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.3 n.3, p.14-es, August 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syed Khairuzzaman Tanbeer , Chowdhury Farhan Ahmed , Byeong-Soo Jeong , Young-Koo Lee, Efficient frequent pattern mining over data streams, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ranjeet Kumar , Preetham Kumar , V. S. Ananthanarayana, Efficient determination of the sequence of attributes of an N-attributed database for obtaining an optimal tree representation, Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases, p.454-459, February 20-22, 2008, Cambridge, UK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Fan , Kun Zhang , Hong Cheng , Jing Gao , Xifeng Yan , Jiawei Han , Philip Yu , Olivier Verscheure, Direct mining of discriminative and essential frequent patterns via model-based search tree, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haoyuan Li , Yi Wang , Dong Zhang , Ming Zhang , Edward Y. Chang, Pfp: parallel fp-growth for query recommendation, Proceedings of the 2008 ACM conference on Recommender systems, October 23-25, 2008, Lausanne, Switzerland
|
|
|
|
|
|
|
|
|
Ian Molloy , Ninghui Li , Tiancheng Li , Ziqing Mao , Qihua Wang , Jorge Lobo, Evaluating role mining algorithms, Proceedings of the 14th ACM symposium on Access control models and technologies, June 03-05, 2009, Stresa, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lile Hattori , Gilson dos Santos Jr , Fernando Cardoso , Marcus Sampaio, Mining software repositories for software change impact analysis: a case study, Proceedings of the 23rd Brazilian symposium on Databases, October 13-17, 2008, Campinas, Sao Paulo, Brazil
|
|
|
|
|
|
Xiujuan Xu , Yu Liu , Zhe Wang , Chunguang Zhou , Yanchun Liang, Catalog segmentation with double constraints in business, Pattern Recognition Letters, v.30 n.4, p.440-448, March, 2009
|
|
|
Guangli Nie , Lingling Zhang , Ying Liu , Xiuyu Zheng , Yong Shi, Decision analysis of data mining project based on Bayesian risk, Expert Systems with Applications: An International Journal, v.36 n.3, p.4589-4594, April, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ruoming Jin , Scott McCallen , Yuri Breitbart , Dave Fuhry , Dong Wang, Estimating the number of frequent itemsets in a large database, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jinyan Li , Haiquan Li , Limsoon Wong , Jian Pei , Guozhu Dong, Minimum description length principle: generators are preferable to closed patterns, Proceedings of the 21st national conference on Artificial intelligence, p.409-414, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Bernecker , Hans-Peter Kriegel , Matthias Renz , Florian Verhein , Andreas Zuefle, Probabilistic frequent itemset mining in uncertain databases, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|
|
|
|
|
Yanfang Ye , Tao Li , Qingshan Jiang , Zhixue Han , Li Wan, Intelligent file scoring system for malware detection from the gray list, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi-Jun Wang , Wei-Qing Sun , Jin-Tao She , Sheng-Biao Wei , Cheng-Min Wang, Frequent pattern network mining algorithm based on transaction-item association matrix, Proceedings of the WSEAES 13th international conference on Computers, p.498-501, July 23-25, 2009, Rodos, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Heon Gyu Lee , Wuon-Shik Kim , Ki Yong Noh , Jin-Ho Shin , Unil Yun , Keun Ho Ryu, Coronary artery disease prediction method using linear and nonlinear feature of heart rate variability in three recumbent postures, Information Systems Frontiers, v.11 n.4, p.419-431, September 2009
|
|
|
|
|
|
|
|