ACM Home Page
Please provide us with feedback. Feedback
Mining compressed frequent-pattern sets
Full text PdfPdf (347 KB)
Source Very Large Data Bases archive
Proceedings of the 31st international conference on Very large data bases table of contents
Trondheim, Norway
SESSION: Research session: data mining table of contents
Pages: 709 - 720  
Year of Publication: 2005
ISBN:1-59593-154-6
Authors
Dong Xin  University of Illinois at Urbana-Champaign, Urbana, IL
Jiawei Han  University of Illinois at Urbana-Champaign, Urbana, IL
Xifeng Yan  University of Illinois at Urbana-Champaign, Urbana, IL
Hong Cheng  University of Illinois at Urbana-Champaign, Urbana, IL
Publisher
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 68,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min_sup threshold may discover only commonsense patterns but a low one may generate an explosive number of output patterns, which severely restricts its usage.In this paper, we study the problem of compressing frequent-pattern sets. Typically, frequent patterns can be clustered with a tightness measure δ (called δ-cluster), and a representative pattern can be selected for each cluster. Unfortunately, finding a minimum set of representative patterns is NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[11], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
5
6
 
7
8
 
9
Frequent Itemset Mining Dataset Repository. http://fimi.cs.helsinki.fi/data/
 
10
 
11
G. Grahne and J. Zhu. Efficiently Using Prefix-trees in Mining Frequent Itemsets. IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03).
 
12
J. Han, et al. Efficient mining of partial periodic patterns in time series database. ICDE'99.
13
 
14
 
15
 
16
 
17
18
 
19
M. Zaki and C. Hsiao. Charm: An Efficient Algorithm for Closed Itemset Mining. SDM'02.

CITED BY  20

Collaborative Colleagues:
Dong Xin: colleagues
Jiawei Han: colleagues
Xifeng Yan: colleagues
Hong Cheng: colleagues