| Summarizing itemset patterns: a profile-based approach |
| Full text |
Pdf
(1.08 MB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
SESSION: Research track paper
table of contents
Pages: 314 - 323
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
Xifeng Yan
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Hong Cheng
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Jiawei Han
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Dong Xin
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 127, Citation Count: 16
|
|
|
ABSTRACT
Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process.In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In Proc. of 1998 Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), pages 30--36, 1998.
|
| |
8
|
|
| |
9
|
|
 |
10
|
Dimitrios Gunopulos , Heikki Mannila , Roni Khardon , Hannu Toivonen, Data mining, hypergraph transversals, and machine learning (extended abstract), Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.209-216, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263684]
|
| |
11
|
|
| |
12
|
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. American Statistical Associations, 58:13--30, 1963.
|
| |
13
|
L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI94 Workshop on Knowledge Discovery in Databases (KDD94), page 169--180, 1994.
|
 |
14
|
Jun Huan , Wei Wang , Deepak Bandyopadhyay , Jack Snoeyink , Jan Prins , Alexander Tropsha, Mining protein family specific residue packing patterns from protein structure graphs, Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, p.308-315, March 27-31, 2004, San Diego, California, USA
[doi> 10.1145/974614.974655]
|
| |
15
|
Monica Hutchins , Herb Foster , Tarak Goradia , Thomas Ostrand, Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria, Proceedings of the 16th international conference on Software engineering, p.191-200, May 16-21, 1994, Sorrento, Italy
|
 |
16
|
|
| |
17
|
T. Mielikainen and H. Mannila. The pattern ordering problem. In Prof. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'03), pages 327--338, 2003.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. Pei, A. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Proc. of 2001 ACM Int. Workshop Data Mining and Knowledge Discovery (DMKD'01), pages 7--12, 2001.
|
 |
23
|
|
 |
24
|
|
 |
25
|
Ke Wang , Chu Xu , Bing Liu, Clustering transactions using large items, Proceedings of the eighth international conference on Information and knowledge management, p.483-490, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.320054]
|
 |
26
|
|
 |
27
|
|
CITED BY 16
|
|
Qiaozhu Mei , Dong Xin , Hong Cheng , Jiawei Han , ChengXiang Zhai, Generating semantic annotations for frequent patterns with context analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
Dong Xin , Hong Cheng , Xifeng Yan , Jiawei Han, Extracting redundancy-aware top-k patterns, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Chen Chen , Cindy Xide Lin , Xifeng Yan , Jiawei Han, On effective presentation of graph patterns: a structural representative approach, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Lei Chang , Tengjiao Wang , Dongqing Yang , Hua Luan , Shiwei Tang, Efficient algorithms for incremental maintenance of closed sequential patterns in large databases, Data & Knowledge Engineering, v.68 n.1, p.68-106, January, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Byron J. Gao , Martin Ester , Jin-Yi Cai , Oliver Schulte , Hui Xiong, The minimum consistent subset cover problem and its applications in data mining, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|