|
ABSTRACT
This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
 |
2
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
3
|
Kohavi, R., Sommerfield, D., and Dougherty, J. Data mining using MLC++: A machine learning library in C++, International Journal on Artificial Intelligence Tools 6(4), 537-566. http://www.sgi.com/Technology/mlc.
|
 |
4
|
|
 |
5
|
Bing Liu , Wynne Hsu , Yiming Ma, Pruning and summarizing the discovered associations, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.125-134, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312216]
|
| |
6
|
|
| |
7
|
Pei, J., Han, J., and Mao, R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery.
|
| |
8
|
Webb, G.I. OPUS: An efficient admissible algorithm for unordered search. JAIR, 3:431-465.
|
 |
9
|
|
 |
10
|
|
| |
11
|
Zheng, Z., Kohavi, R., and Mason, L. Real word performance of association rule algorithms (long version), 2001. Available at http://robotics.stanford.edu/~ronnyk/ assocBenchmarkLong.pdf.
|
CITED BY 54
|
|
|
|
|
|
|
|
|
|
|
Dimitrios Gunopulos , Roni Khardon , Heikki Mannila , Sanjeev Saluja , Hannu Toivonen , Ram Sewak Sharma, Discovering all most specific sentences, ACM Transactions on Database Systems (TODS), v.28 n.2, p.140-174, June 2003
|
|
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
Guimei Liu , Hongjun Lu , Wenwu Lou , Jeffrey Xu Yu, On computing, storing and querying frequent patterns, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on modern and emerging processors, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.77-96, January 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|