| On support thresholds in associative classification |
| Full text |
Pdf
(375 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2004 ACM symposium on Applied computing
table of contents
Nicosia, Cyprus
SESSION: Data mining (DM)
table of contents
Pages: 553 - 558
Year of Publication: 2004
ISBN:1-58113-812-1
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 39, Citation Count: 2
|
|
|
ABSTRACT
Associative classification is a well-known technique for structured data classification. Most previous works on associative classification use support based pruning for rule extraction, and usually set the threshold value to 1%. This threshold allows rule extraction to be tractable and on the average yields a good accuracy. We believe that this threshold may be not accurate in some cases, since the class distribution in the dataset is not taken into account. In this paper we investigate the effect of support threshold on classification accuracy. Lower support thresholds are often unfeasible with current extraction algorithms, or may cause the generation of a huge rule set. To observe the effect of varying the support threshold, we first propose a compact form to encode a complete rule set. We then develop a new classifier, named L3G, based on the compact form. Taking advantage of the compact form, the classifier can be built also with rather low support rules. We ran a variety of experiments with different support thresholds on datasets from the UCI machine learning database repository. The experiments showed that the optimal accuracy is obtained for variable threshold values, sometime lower than 1%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.
|
| |
6
|
|
 |
7
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
 |
8
|
|
| |
9
|
B. Cremilleux and J.-F. Boulicaut. Simplest rules characterizing classes generated by delta-free sets. In ES'02.
|
| |
10
|
|
| |
11
|
E. Baralis and S. Chiusano. Minimal non redundant classification rule sets. IEEE ICDM Workshop on Foundations of Data Mining and Discovery, 2002.
|
| |
12
|
|
| |
13
|
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD'98.
|
| |
14
|
|
| |
15
|
|
| |
16
|
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Closed itemsets discovery of small covers for association rules. In Networking and Information Systems, June 2001.
|
| |
17
|
J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD DMKD'00.
|
| |
18
|
J. Quinlan. C4.5: program for classification learning. Morgan Kaufmann, 1992.
|
 |
19
|
Ke Wang , Senqiang Zhou , Yu He, Growing decision trees on support-less association rules, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.265-269, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347147]
|
 |
20
|
|
| |
21
|
M. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In SIAM'02.
|
|