| Direct mining of discriminative and essential frequent patterns via model-based search tree |
| Full text |
Pdf
(335 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 230-238
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Wei Fan
|
IBM T.J.Watson, Hawthorne, NY, USA
|
|
Kun Zhang
|
Xavier University of Louisiana, New Orleands, LA, USA
|
|
Hong Cheng
|
University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
|
|
Jing Gao
|
University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
|
|
Xifeng Yan
|
IBM T.J.Watson, Hawthorne, NY, USA
|
|
Jiawei Han
|
University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
|
|
Philip Yu
|
University of Illinois at Chi ago, Chicago, IL, USA
|
|
Olivier Verscheure
|
IBM T.J.Watson, Hawthorne, NY, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 22, Downloads (12 Months): 359, Citation Count: 1
|
|
|
ABSTRACT
Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique patterns is exponential but many are non-discriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted two-step approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different nodes. Then at each node, it directly discovers a discriminative pattern to further divide its examples into purer subsets. Since the number of examples towards leaf level is relatively small, the new approach is able to examine patterns with extremely low global support that could not be enumerated on the whole dataset by the two-step method. The discovered feature vectors are more accurate on some of the most difficult graph as well as frequent itemset problems than most recently proposed algorithms but the total size is typically 50% or more smaller. Importantly, the minimum support of some discriminative patterns can be extremely low (e.g. 0.03%). In order to enumerate these low support patterns, state-of-the-art frequent pattern algorithm either cannot finish due to huge memory consumption or have to enumerate 101 to 103 times more patterns before they can even be found. Software and datasets are available by contacting the author.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. of ICDE, 2007.
|
| |
6
|
H. Cheng, X. Yan, J. Han, and P. Yu. Direct discriminative pattern mining for effective classification. In Proc. of ICDE, 2008.
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
Holger Fröhlich , Jörg K. Wegner , Florian Sieker , Andreas Zell, Optimal assignment kernels for attributed molecular graphs, Proceedings of the 22nd international conference on Machine learning, p.225-232, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102380]
|
| |
11
|
Amol Ghoting , Gregory Buehrer , Srinivasan Parthasarathy , Daehyun Kim , Anthony Nguyen , Yen-Kuang Chen , Pradeep Dubey, Cache-conscious frequent pattern mining on a modern processor, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
12
|
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), 2003.
|
 |
13
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
 |
14
|
|
 |
15
|
|
| |
16
|
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Proc. of NIPS, pages 729--736, 2004.
|
| |
17
|
|
| |
18
|
|
| |
19
|
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of KDD, 1998.
|
| |
20
|
P. Mahë, N. Ueda, T. Akutsu, J. Perret, and J. Vert. Extensions of marginalized graph kernels. In Proc. of ICML, pages 552--559, 2004.
|
| |
21
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Helen Pinto , Qiming Chen , Umeshwar Dayal , Meichun Hsu, PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth, Proceedings of the 17th International Conference on Data Engineering, p.215-224, April 02-06, 2001
|
| |
22
|
|
| |
23
|
J. Wang and G. Karypis. HARMONY: Efficiently mining the best rules for classification. In Proc. of SDM, pages 205--216, 2005.
|
 |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
X. Yin and J. Han. Cpar: Classification based on predictive association rules. In Proc. of SDM, 2003.
|
|