ACM Home Page
Please provide us with feedback. Feedback
Efficient discovery of error-tolerant frequent itemsets in high dimensions
Full text PdfPdf (1.11 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 194 - 203  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Cheng Yang  Stanford University, Stanford, CA
Usama Fayyad  digiMine, Inc., Bellevue, WA
Paul S. Bradley  digiMine, Inc., Bellevue, WA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 62,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502539
What is a DOI?

ABSTRACT

We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that the algorithm consistently uncovers structure in large sparse databases that other traditional clustering algorithms fail to find.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

AGGR98
AIS93
 
AMSTV96
 
AS94
B98
BFG99
 
BFR98
P. S. Bradley, U. M. Fayyad and C. Reina. Scaling EM (Expectation-Maximization) Clustering to Large Databases. Technical Report MSR-TR-98-35, Microsoft Research, 1998.
 
BHK98
J. Breese, D. Heckerman and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Technical Report MSR-TR-98-12, Microsoft Research, 1998.
 
CS96
 
DLR77
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1973.
 
DH73
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
 
FI93
U.M. Fayyad and K.B. Irani. "Multi-interval Discretization of Continuous-valued Attributes for Classification Learning." Proc. of the 13th Intl. Joint Conf. on Artificial Intelligence. HCAI-93: Chambery, France (1993).
GGR99
 
GKP89
 
GKR98
 
GMS97
GRS98
 
GRS99
 
MH98
M. Meila and D. Heckerman. An Experimental Comparison of Several Clustering and Initialization Methods. Technical Report MSR-TR-98-06, Microsoft Research, 1998.
 
MRK97
B. Miller, J. Riedl and J. Konstan. Experiences with GroupLens: Making Usenet Useful Again. In Proc. USENIX 1997 Tech. Conf., pp. 219-231, Anaheim, CA, 1997.
 
NH94
 
RG99
R. Ramakrishnan and J. Gehrke. Principles of Database Management (2 ad Edition). 1999.
R*97
RV97
SFB99
SA96
 
YFB00
C. Yang, U. Fayyad and P. S. Bradley. Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. Technical Report MSR-TR-2000-20, Microsoft Research, 2000.
 
ZPOL97
M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li. New Algorithms for Fast Discovery of Association Rules. In Proc. of the Third lnt'l Conf. On Knowledge Discovery in Databases and Data Mining, pp. 283-286, 1997.
ZRL96
 
Z49
G.E. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.

CITED BY  19

Collaborative Colleagues:
Cheng Yang: colleagues
Usama Fayyad: colleagues
Paul S. Bradley: colleagues