ACM Home Page
Please provide us with feedback. Feedback
Summarizing itemset patterns: a profile-based approach
Full text PdfPdf (1.08 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 314 - 323  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Xifeng Yan  University of Illinois at Urbana-Champaign, Urbana, IL
Hong Cheng  University of Illinois at Urbana-Champaign, Urbana, IL
Jiawei Han  University of Illinois at Urbana-Champaign, Urbana, IL
Dong Xin  University of Illinois at Urbana-Champaign, Urbana, IL
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 127,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081907
What is a DOI?

ABSTRACT

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process.In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
 
6
 
7
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In Proc. of 1998 Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), pages 30--36, 1998.
 
8
 
9
10
 
11
 
12
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. American Statistical Associations, 58:13--30, 1963.
 
13
L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI94 Workshop on Knowledge Discovery in Databases (KDD94), page 169--180, 1994.
14
 
15
16
 
17
T. Mielikainen and H. Mannila. The pattern ordering problem. In Prof. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'03), pages 327--338, 2003.
 
18
 
19
 
20
 
21
 
22
J. Pei, A. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Proc. of 2001 ACM Int. Workshop Data Mining and Knowledge Discovery (DMKD'01), pages 7--12, 2001.
23
24
25
26
27

CITED BY  16

Collaborative Colleagues:
Xifeng Yan: colleagues
Hong Cheng: colleagues
Jiawei Han: colleagues
Dong Xin: colleagues