ACM Home Page
Please provide us with feedback. Feedback
The IOC algorithm: efficient many-class non-parametric classification for high-dimensional data
Full text PdfPdf (129 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 629 - 634  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Ting Liu  Carnegie Mellon University, Pittsburgh, PA
Ke Yang  Carnegie Mellon University, Pittsburgh, PA
Andrew W. Moore  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 30,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014131
What is a DOI?

ABSTRACT

This paper is about a variant of k nearest neighbor classification on large many-class high dimensional datasets.K nearest neighbor remains a popular classification technique, especially in areas such as computer vision, drug activity prediction and astrophysics. Furthermore, many more modern classifiers, such as kernel-based Bayes classifiers or the prediction phase of SVMs, require computational regimes similar to k-NN. We believe that tractable k-NN algorithms therefore continue to be important.This paper relies on the insight that even with many classes, the task of finding the majority class among the k nearest neighbors of a query need not require us to explicitly find those k nearest neighbors. This insight was previously used in (Liu et al., 2003) in two algorithms called KNS2 and KNS3 which dealt with fast classification in the case of two classes. In this paper we show how a different approach, IOC (standing for the International Olympic Committee) can apply to the case of n classes where n > 2.IOC assumes a slightly different processing of the datapoints in the neighborhood of the query. This allows it to search a set of metric trees, one for each class. During the searches it is possible to quickly prune away classes that cannot possibly be the majority.We give experimental results on datasets of up to 5.8 x 105 records and 1.5 x 103 attributes, frequently showing an order of magnitude acceleration compared with each of (i) conventional linear scan, (ii) a well-known independent SR-tree implementation of conventional k-NN and (iii) a highly optimized conventional k-NN metric tree search.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Jock A. Blackard. Forest covertype database. http://kdd.ics.uci.edu/databases/covertype/covertype.data.html.
 
5
 
6
Ron Cole and Mark Fanty. Isolet spoken letter recognition database. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/isolet/.
7
 
8
 
9
J. M. Hammersley. The Distribution of Distances in a Hypersphere. Annals of Mathematical Statistics, 21:447--452, 1950.
 
10
CMU informedia digital video library project. The trec-2001 video trackorganized by nist shot boundary task. 2001.
 
11
IOC. International olympic committee: Candidature acceptance procedure. http://multimedia.olympic.org/pdf/ en report 711.pdf, 1999.
 
12
Norio Katayama and Shin'ichi Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. pages 369--380, 1997.
 
13
Nicholas Kushmerick. Internet advertisements. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/internet ads/.
 
14
Ting Liu, Andrew Moore, and Alexander Gray. Efficient exact k-nn and nonparametric classification in high dimensions. In Proceedings of Neural Information Processing Systems, 2003.
 
15
 
16
 
17
 
18
Yanjun Qi, Alexander G. Hauptmann, and Ting Liu. Supervised classification for video shot segmentation. In proceedings of 2003 IEEE International Conference on Multimedia & Expo, 2003.
 
19
D. B. Skalak. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In W. W. Cohen and H. Hirsh, editors, Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994.
 
20
David J. Slate. Letter recognition database. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/letter-recognition/.
 
21
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40:175--179, 1991.
 
22


Collaborative Colleagues:
Ting Liu: colleagues
Ke Yang: colleagues
Andrew W. Moore: colleagues