|
ABSTRACT
Advances in the efficient discovery of frequent itemsets have led to the development of a number of schemes that use frequent itemsets to aid developing accurate and efficient classifiers. These approaches use the frequent itemsets to generate a set of composite features that expand the dimensionality of the underlying dataset. In this paper, we build upon this work and (i) present a variety of schemes for composite feature selection that achieve a substantial reduction in the number of features without adversely affecting the accuracy gains, and (ii) show (both analytically and experimentally) that the composite features can lead to improved classification models even in the context of support vector machines, in which the dimensionality can automatically be expanded by the use of appropriate kernel functions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347114]
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretisation of continuous features. In Machine Learning: Proceedings of the Twelfth Internation Conference, 1995.
|
| |
5
|
U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993.
|
| |
6
|
|
 |
7
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
8
|
|
 |
9
|
Neal Lesh , Mohammed J. Zaki , Mitsunori Ogihara, Mining features for sequence classification, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.342-346, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312275]
|
| |
10
|
|
| |
11
|
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In 4th Internation Conference on Knowledge Discovery and Data Mining, 1998.
|
| |
12
|
C. J. Matheus and L. Rendell. Constructive induction on decision trees. In Proceedings of the Eleventh International Joint Conference on Artifical Intelligence, 1989.
|
| |
13
|
C. Merz and P. Murphy. UCI repository of machine learning databases, 1998.
|
| |
14
|
|
| |
15
|
P. M. Murphy and M. J. Pazzani. Id2-of-3: Constructive induction of m-of-n concepts for discriminators in decision trees. In Proc. of the 8th Int ÿ Workshop on Machine Learning, 1991.
|
| |
16
|
|
| |
17
|
M. R. S. Pattern recognition as knowledge guided computer induction. Technical report, University of Illinois at Urbana Champaign, 1978.
|
| |
18
|
|
| |
19
|
|
| |
20
|
V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.
|
| |
21
|
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection fof svms. Advances in Neural Information Processing Systems, 2000.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Z. Zheng. Constructing conjunctive attributes using production rules. Journal of Research and Practice in Information Technology, 2000.
|
| |
26
|
Z. Zijian. A comparison of constructive induction with different types of new attribute. Technical report, School of Computing and Mathematics, Deakin University, Geelong, Victoria, Australia, 1996.
|
|