ACM Home Page
Please provide us with feedback. Feedback
Approximating a collection of frequent sets
Full text PdfPdf (221 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 12 - 19  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Foto Afrati  University of Athens, Greece
Aristides Gionis  University of Helsinki, Finland
Heikki Mannila  University of Helsinki, Finland
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 103,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014057
What is a DOI?

ABSTRACT

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
William Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968.
 
6
 
7
Bart Goethals. Frequent itemset mining implementations. http://www.cs.helsinki.fi/u/goethals/software/.
 
8
 
9
10
 
11
 
12
 
13
 
14

CITED BY  21

Collaborative Colleagues:
Foto Afrati: colleagues
Aristides Gionis: colleagues
Heikki Mannila: colleagues