ACM Home Page
Please provide us with feedback. Feedback
Real world performance of association rule algorithms
Full text PdfPdf (536 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 401 - 406  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Zijian Zheng  Blue Martini Software, San Mateo, CA
Ron Kohavi  Blue Martini Software, San Mateo, CA
Llew Mason  Blue Martini Software, San Mateo, CA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 195,   Citation Count: 54
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502572
What is a DOI?

ABSTRACT

This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Kohavi, R., Sommerfield, D., and Dougherty, J. Data mining using MLC++: A machine learning library in C++, International Journal on Artificial Intelligence Tools 6(4), 537-566. http://www.sgi.com/Technology/mlc.
4
5
 
6
 
7
Pei, J., Han, J., and Mao, R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery.
 
8
Webb, G.I. OPUS: An efficient admissible algorithm for unordered search. JAIR, 3:431-465.
9
10
 
11
Zheng, Z., Kohavi, R., and Mason, L. Real word performance of association rule algorithms (long version), 2001. Available at http://robotics.stanford.edu/~ronnyk/ assocBenchmarkLong.pdf.

CITED BY  54

Collaborative Colleagues:
Zijian Zheng: colleagues
Ron Kohavi: colleagues
Llew Mason: colleagues