| Approximating a collection of frequent sets |
| Full text |
Pdf
(221 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Seattle, WA, USA
SESSION: Research track papers
table of contents
Pages: 12 - 19
Year of Publication: 2004
ISBN:1-58113-888-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 103, Citation Count: 21
|
|
|
ABSTRACT
One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
William Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968.
|
| |
6
|
|
| |
7
|
Bart Goethals. Frequent itemset mining implementations. http://www.cs.helsinki.fi/u/goethals/software/.
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
CITED BY 21
|
|
|
|
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Qiaozhu Mei , Dong Xin , Hong Cheng , Jiawei Han , ChengXiang Zhai, Generating semantic annotations for frequent patterns with context analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Dong Xin , Hong Cheng , Xifeng Yan , Jiawei Han, Extracting redundancy-aware top-k patterns, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
Lei Chang , Tengjiao Wang , Dongqing Yang , Hua Luan , Shiwei Tang, Efficient algorithms for incremental maintenance of closed sequential patterns in large databases, Data & Knowledge Engineering, v.68 n.1, p.68-106, January, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Byron J. Gao , Martin Ester , Jin-Yi Cai , Oliver Schulte , Hui Xiong, The minimum consistent subset cover problem and its applications in data mining, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Yang Xiang , Ruoming Jin , David Fuhry , Feodor F. Dragan, Succinct summarization of transactional databases: an overlapped hyperrectangle scheme, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|