|
ABSTRACT
Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask "what are the pros and cons of the strategies?" and "what and how can we pick and integrate the best strategies to achieve higher performance in general cases?"In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
 |
3
|
|
 |
4
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
9
|
|
 |
10
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775081]
|
 |
11
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
12
|
|
| |
13
|
Jian Pei , Jiawei Han , Hongjun Lu , Shojiro Nishio , Shiwei Tang , Dongqing Yang, H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases, Proceedings of the 2001 IEEE International Conference on Data Mining, p.441-448, November 29-December 02, 2001
|
| |
14
|
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In DMKD'00, May 2000.
|
| |
15
|
R. Rymon. Search through Systematic Set Enumeration. In Proc. of 3rd Int. Conf. on Principles of Knowledge Representation and Reasoning, 1992.
|
| |
16
|
|
 |
17
|
|
| |
18
|
M. Zaki and C. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In SDM'02, April 2002.
|
 |
19
|
|
CITED BY 66
|
|
|
|
|
Feng Pan , Gao Cong , Anthony K. H. Tung , Jiong Yang , Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
Guimei Liu , Hongjun Lu , Wenwu Lou , Jeffrey Xu Yu, On computing, storing and querying frequent patterns, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
Gao Cong , Anthony K. H. Tung , Xin Xu , Feng Pan , Jiong Yang, FARMER: finding interesting rule groups in microarray datasets, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haiquan Li , Jinyan Li , Limsoon Wong , Mengling Feng , Yap-Peng Tan, Relative risk and odds ratio: a data mining perspective, Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 13-15, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony J. T. Lee , Chun-Sheng Wang , Wan-Yu Weng , Yi-An Chen , Huei-Wen Wu, An efficient algorithm for mining closed inter-transaction itemsets, Data & Knowledge Engineering, v.66 n.1, p.68-91, July, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nancy P. Lin , Wei-Hua Hao , Hung-Jen Chen , Hao-En Chueh , Chung-I Chang, Fast mining maximal sequential patterns, Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization, p.405-408, September 15-17, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Pei , Haixun Wang , Jian Liu , Ke Wang , Jianyong Wang , Philip S. Yu, Discovering Frequent Closed Partial Orders from Strings, IEEE Transactions on Knowledge and Data Engineering, v.18 n.11, p.1467-1481, November 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jin Longcun , Wan Wanggen , Cui Bin , Yu Xiaoqing , Xu Hongwei, A new multimedia information data mining method, Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, June 12-14, 2009, Shanghai, China
|
|
|
Jinyan Li , Haiquan Li , Limsoon Wong , Jian Pei , Guozhu Dong, Minimum description length principle: generators are preferable to closed patterns, Proceedings of the 21st national conference on Artificial intelligence, p.409-414, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|