| Mining compressed frequent-pattern sets |
| Full text |
Pdf
(347 KB)
|
| Source
|
Very Large Data Bases
archive
Proceedings of the 31st international conference on Very large data bases
table of contents
Trondheim, Norway
SESSION: Research session: data mining
table of contents
Pages: 709 - 720
Year of Publication: 2005
ISBN:1-59593-154-6
|
|
Authors
|
|
Dong Xin
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Jiawei Han
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Xifeng Yan
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Hong Cheng
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 68, Citation Count: 20
|
|
|
ABSTRACT
A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min_sup threshold may discover only commonsense patterns but a low one may generate an explosive number of output patterns, which severely restricts its usage.In this paper, we study the problem of compressing frequent-pattern sets. Typically, frequent patterns can be clustered with a tightness measure δ (called δ-cluster), and a representative pattern can be selected for each cluster. Unfortunately, finding a minimum set of representative patterns is NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[11], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
| |
7
|
|
 |
8
|
|
| |
9
|
Frequent Itemset Mining Dataset Repository. http://fimi.cs.helsinki.fi/data/
|
| |
10
|
|
| |
11
|
G. Grahne and J. Zhu. Efficiently Using Prefix-trees in Mining Frequent Itemsets. IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03).
|
| |
12
|
J. Han, et al. Efficient mining of partial periodic patterns in time series database. ICDE'99.
|
 |
13
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
M. Zaki and C. Hsiao. Charm: An Efficient Algorithm for Closed Itemset Mining. SDM'02.
|
CITED BY 20
|
|
Qiaozhu Mei , Dong Xin , Hong Cheng , Jiawei Han , ChengXiang Zhai, Generating semantic annotations for frequent patterns with context analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Dong Xin , Hong Cheng , Xifeng Yan , Jiawei Han, Extracting redundancy-aware top-k patterns, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Dong Xin , Xuehua Shen , Qiaozhu Mei , Jiawei Han, Discovering interesting patterns through user's interactive feedback, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
Chen Chen , Cindy Xide Lin , Xifeng Yan , Jiawei Han, On effective presentation of graph patterns: a structural representative approach, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Lei Chang , Tengjiao Wang , Dongqing Yang , Hua Luan , Shiwei Tang, Efficient algorithms for incremental maintenance of closed sequential patterns in large databases, Data & Knowledge Engineering, v.68 n.1, p.68-106, January, 2009
|
|
|
|
|
|
|
|
|
Hanghang Tong , Spiros Papadimitriou , Jimeng Sun , Philip S. Yu , Christos Faloutsos, Colibri: fast mining of large static and dynamic graphs, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Hanghang Tong , Yasushi Sakurai , Tina Eliassi-Rad , Christos Faloutsos, Fast mining of complex time-stamped events, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Kensuke Onuma , Hanghang Tong , Christos Faloutsos, TANGENT: a novel, 'Surprise me', recommendation algorithm, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
Adam Kirsch , Michael Mitzenmacher , Andrea Pietracaprina , Geppino Pucci , Eli Upfal , Fabio Vandin, An efficient rigorous approach for identifying statistically significant frequent itemsets, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|