|
ABSTRACT
Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present.We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets with a given density. These algorithms select the other parameter automatically, which simplifies mining dense itemsets in an explorative way. We show that the concept captures natural facets of data sets, and give extensive empirical results on the performance of the algorithms. Combining the concept of dense itemsets with set cover ideas, we also show that dense itemsets can be used to obtain succinct descriptions of large datasets. We also discuss some variations of dense itemsets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
 |
3
|
|
| |
4
|
|
 |
5
|
Tom Brijs , Gilbert Swinnen , Koen Vanhoof , Geert Wets, Using association rules for product assortment decisions: a case study, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.254-260, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312241]
|
| |
6
|
A. Bykowski, J. K. Seppänen, and J. Hollmén. Model-independent bounding of Boolean formulae in binary data. In M. Klemettinen, R. Meo, F. Giannotti, and L. De Raedt, editors, Knowledge Discovery in Inductive Databases (KDID--02), First International Workshop, pages 20--31, Helsinki, Finland, 2002. University of Helsinki Department of Computer Science Report B--2002--7.
|
| |
7
|
|
| |
8
|
|
| |
9
|
F. Geerts, B. Goethals, and T. Mielikäinen. Mining tiles and tilings. Manuscript in preparation.
|
| |
10
|
B. Goethals and M. J. Zaki, editors. Proc. Workshop on Frequent Itemset Mining Implementations (FIMI--03), volume 90 of CEUR-WS, Melbourne, Florida, 2003. http://CEUR-WS.org/Vol-90/.
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, 2000.
|
| |
15
|
T. Mielikäinen and H. Mannila. The pattern ordering problem. In N. Lavrac, D. Gamberger, L. Todorovski, and H. Blockeel, editors, Proc. PKDD--2003, volume 2383 of LNAI, pages 327--338. Springer, 2003.
|
| |
16
|
|
| |
17
|
J. Pei, A. K. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.
|
| |
18
|
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, 240(4857):1285--93, June 1988.
|
| |
19
|
|
 |
20
|
|
CITED BY 4
|
|
|
|
|
Rohit Gupta , Gang Fang , Blayne Field , Michael Steinbach , Vipin Kumar, Quantitative evaluation of approximate frequent pattern mining algorithms, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|