ACM Home Page
Please provide us with feedback. Feedback
Dense itemsets
Full text PdfPdf (562 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 683 - 688  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Jouni K. Seppänen  Helsinki University of Technology, Finland
Heikki Mannila  Helsinki University of Technology, Finland
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 55,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014140
What is a DOI?

ABSTRACT

Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present.We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets with a given density. These algorithms select the other parameter automatically, which simplifies mining dense itemsets in an explorative way. We show that the concept captures natural facets of data sets, and give extensive empirical results on the performance of the algorithms. Combining the concept of dense itemsets with set cover ideas, we also show that dense itemsets can be used to obtain succinct descriptions of large datasets. We also discuss some variations of dense itemsets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
5
 
6
A. Bykowski, J. K. Seppänen, and J. Hollmén. Model-independent bounding of Boolean formulae in binary data. In M. Klemettinen, R. Meo, F. Giannotti, and L. De Raedt, editors, Knowledge Discovery in Inductive Databases (KDID--02), First International Workshop, pages 20--31, Helsinki, Finland, 2002. University of Helsinki Department of Computer Science Report B--2002--7.
 
7
 
8
 
9
F. Geerts, B. Goethals, and T. Mielikäinen. Mining tiles and tilings. Manuscript in preparation.
 
10
B. Goethals and M. J. Zaki, editors. Proc. Workshop on Frequent Itemset Mining Implementations (FIMI--03), volume 90 of CEUR-WS, Melbourne, Florida, 2003. http://CEUR-WS.org/Vol-90/.
 
11
 
12
13
 
14
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, 2000.
 
15
T. Mielikäinen and H. Mannila. The pattern ordering problem. In N. Lavrac, D. Gamberger, L. Todorovski, and H. Blockeel, editors, Proc. PKDD--2003, volume 2383 of LNAI, pages 327--338. Springer, 2003.
 
16
 
17
J. Pei, A. K. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.
 
18
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, 240(4857):1285--93, June 1988.
 
19
20


Collaborative Colleagues:
Jouni K. Seppänen: colleagues
Heikki Mannila: colleagues