|
ABSTRACT
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
M. Houtsma and A. Swami. Set-Oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden Research Laboratory, San Jose, CA, October 1993.
|
 |
9
|
|
| |
10
|
|
| |
11
|
G. Piatetsky-Shapiro. Discovery, Analysis and Presentation of Strong Rules. Knowledge Discovery in Databases, 1991.
|
| |
12
|
|
CITED BY 211
|
|
Suk-Chung Yoon , Lawrence J. Henschen , E. K. Park , Sam Makki, Using domain knowledge in knowledge discovery, Proceedings of the eighth international conference on Information and knowledge management, p.243-250, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
S. C. Yoon , I. Y. Song , E. K. Park, Intensional query processing using data mining approaches, Proceedings of the sixth international conference on Information and knowledge management, p.201-208, November 10-14, 1997, Las Vegas, Nevada, United States
|
|
|
|
|
|
Badrul Sarwar , George Karypis , Joseph Konstan , John Riedl, Analysis of recommendation algorithms for e-commerce, Proceedings of the 2nd ACM conference on Electronic commerce, p.158-167, October 17-20, 2000, Minneapolis, Minnesota, United States
|
|
|
|
|
|
Charų C. Aggarwal , Zheng Sun , Philip S. Yu, Online algorithms for finding profile association rules, Proceedings of the seventh international conference on Information and knowledge management, p.86-95, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, Efficient parallel data mining for association rules, Proceedings of the fourth international conference on Information and knowledge management, p.31-36, November 29-December 02, 1995, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Biswadeep Nag , Prasad M. Deshpande , David J. DeWitt, Using a knowledge cache for interactive discovery of association rules, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.244-253, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Jong Soo Park , Philip S. Yu , Ming-Syan Chen, Mining association rules with adjustable accuracy, Proceedings of the sixth international conference on Information and knowledge management, p.151-160, November 10-14, 1997, Las Vegas, Nevada, United States
|
|
|
Anthony K.H. Tung , Hongjun Lu , Jiawei Han , Ling Feng, Breaking the barrier of transactions: mining inter-transaction association rules, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.297-301, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Bing Liu , Wynne Hsu , Yiming Ma, Mining association rules with multiple minimum supports, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.337-341, August 15-18, 1999, San Diego, California, United States
|
|
|
Takeshi Fukuda , Yasukiko Morimoto , Shinichi Morishita , Takeshi Tokuyama, Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization, ACM SIGMOD Record, v.25 n.2, p.13-23, June 1996
|
|
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Necip Fazil Ayan , Abdullah Uz Tansel , Erol Arkun, An efficient algorithm to update large itemsets with early pruning, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.287-291, August 15-18, 1999, San Diego, California, United States
|
|
|
Minos N. Garofalakis , Rajeev Rastogi , S. Seshadri , Kyuseok Shim, Data mining and the Web: past, present and future, Proceedings of the 2nd international workshop on Web information and data management, p.43-47, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
Tom Brijs , Gilbert Swinnen , Koen Vanhoof , Geert Wets, Using association rules for product assortment decisions: a case study, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.254-260, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Takeshi Fukuda , Yasuhido Morimoto , Shinichi Morishita , Takeshi Tokuyama, Mining optimized association rules for numeric attributes, Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.182-191, June 04-06, 1996, Montreal, Quebec, Canada
|
|
|
Sergey Brin , Rajeev Rastogi , Kyuseok Shim, Mining optimized gain rules for numeric attributes, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.135-144, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Lisa Singh , Peter Scheuermann , Bin Chen, Generating association rules from semi-structured documents using an extended concept hierarchy, Proceedings of the sixth international conference on Information and knowledge management, p.193-200, November 10-14, 1997, Las Vegas, Nevada, United States
|
|
|
Lilian Harada , Naoki Akaboshi , Kazutaka Ogihara , Riichiro Take, Dynamic skew handling in parallel mining of association rules, Proceedings of the seventh international conference on Information and knowledge management, p.76-85, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guimei Liu , Hongjun Lu , Wenwu Lou , Jeffrey Xu Yu, On computing, storing and querying frequent patterns, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mohammed Javeed Zaki , Srinivasan Parthasarathy , Wei Li, A localized algorithm for parallel association mining, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.321-330, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
R. Ben-Eliyahu-Zohary , C. Domshlak , E. Gudes , N. Liusternik , A. Meisels , T. Rosen , S. E. Shimony, FlexiMine – A Flexible Platform for KDD Research and Application Development, Annals of Mathematics and Artificial Intelligence, v.39 n.1-2, p.175-204, September 2003
|
|
|
|
|
|
M. J. Zaki , M. Ogihara , S. Parthasarathy , W. Li, Parallel data mining for association rules on shared-memory multi-processors, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.43-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wai-Ho Au , Keith C. C. Chan , Andrew K. C. Wong , Yang Wang, Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), v.2 n.2, p.83-101, April 2005
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anthony J. T. Lee , Chun-Sheng Wang , Wan-Yu Weng , Yi-An Chen , Huei-Wen Wu, An efficient algorithm for mining closed inter-transaction itemsets, Data & Knowledge Engineering, v.66 n.1, p.68-91, July, 2008
|
|
|
|
|
|
|
|
|
|
|
|
Gregory Buehrer , Yen-Kuang Chen , Srinivasan Parthasarathy , Anthony Nguyen , Amol Ghoting , Daehyun Kim, Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures, Proceedings of the 2006 workshop on Memory system performance and correctness, October 22-22, 2006, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on modern and emerging processors, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.77-96, January 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|