ACM Home Page
Please provide us with feedback. Feedback
An effective hash-based algorithm for mining association rules
Full text PdfPdf (1.16 MB)
Source International Conference on Management of Data archive
Proceedings of the 1995 ACM SIGMOD international conference on Management of data table of contents
San Jose, California, United States
Pages: 175 - 186  
Year of Publication: 1995
ISBN:0-89791-731-6
Also published in ...
Authors
Jong Soo Park  IBM Thomas J. Watson Research Center, Yorktown Heights, New York
Ming-Syan Chen  IBM Thomas J. Watson Research Center, Yorktown Heights, New York
Philip S. Yu  IBM Thomas J. Watson Research Center, Yorktown Heights, New York
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 48,   Downloads (12 Months): 311,   Citation Count: 211
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/223784.223813
What is a DOI?

ABSTRACT

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
 
6
 
7
 
8
M. Houtsma and A. Swami. Set-Oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden Research Laboratory, San Jose, CA, October 1993.
9
 
10
 
11
G. Piatetsky-Shapiro. Discovery, Analysis and Presentation of Strong Rules. Knowledge Discovery in Databases, 1991.
 
12

CITED BY  211

Collaborative Colleagues:
Jong Soo Park: colleagues
Ming-Syan Chen: colleagues
Philip S. Yu: colleagues