ACM Home Page
Please provide us with feedback. Feedback
A novel distance-based classifier built on pattern ranking
Full text PdfPdf (263 KB)
Source
Symposium on Applied Computing archive
Proceedings of the 2009 ACM symposium on Applied Computing table of contents
Honolulu, Hawaii
SESSION: Data mining track table of contents
Pages 1427-1432  
Year of Publication: 2009
ISBN:978-1-60558-166-8
Authors
Dipankar Bachar  Università degli Studi di Torino, Italy
Rosa Meo  Università degli Studi di Torino, Italy
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 51,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1529282.1529602
What is a DOI?

ABSTRACT

Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from over-fitting. In this paper we propose a new type of distance-based classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance. Then we considered only the top portion of the ranking that leads the classifier to reach the maximum accuracy. We have experimented on a large collection of datasets from UCI archive with different proximity measures and different metrics of itemsets ranking.

We show that our method has many benefits: it reduces the number of distance computations, improves the classification accuracy of state-of-the art classifiers, like decision trees, SVM, k-nn, Naive Bayes, rule-based classifiers and association rule-based ones and outperforms the competitors especially on noise data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
B. Bigi. Using K-L distance for text categorization. Advances in Information Retrieval, 2633: 76, 2003.
 
4
Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. ICDE, 0: 716--725, 2007.
 
5
W. Cohen. Fast effective rule induction. Proc. Int. Conf. Machine Learning, pages 115--123, 1995.
 
6
T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13: 21--27, 1967.
 
7
 
8
 
9
Usama M. Fayyad and Keki B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. Proc. IJCAI'93, pp. 1022--1027.
 
10
 
11
 
12
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22: 79--86, 1951.
 
13
 
14
Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 80--86, 1998.
15
16
 
17
R. F. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1--6): 579--589, 1991.
 
18
T. Steinbach and Kumar. Introduction to Data Mining. Pearson education, 2006.
 
19

Collaborative Colleagues:
Dipankar Bachar: colleagues
Rosa Meo: colleagues