ACM Home Page
Please provide us with feedback. Feedback
Contorting high dimensional data for efficient main memory KNN processing
Full text PdfPdf (230 KB)
Source International Conference on Management of Data archive
Proceedings of the 2003 ACM SIGMOD international conference on Management of data table of contents
San Diego, California
SESSION: Spatial and nearest-neighbor queries table of contents
Pages: 479 - 490  
Year of Publication: 2003
ISBN:1-58113-634-X
Authors
Bin Cui  National University of Singapore, Singapore
Beng Chin Ooi  National University of Singapore, Singapore
Jianwen Su  University of California, Santa Barbara, CA
Kian-Lee Tan  National University of Singapore, Singapore
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 51,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/872757.872815
What is a DOI?

ABSTRACT

In this paper, we present a novel index structure, called Δ-tree, to speed up processing of high-dimensional K-nearest neighbor (KNN) queries in main memory environment. The Δ-tree is a multi-level structure where each level represents the data space at different dimensionalities: the number of dimensions increases towards the leaf level which contains the data at their full dimensions. The remaining dimensions are obtained using Principal Component Analysis, which has the desirable property that the first few dimensions capture most of the information in the dataset. Each level of the tree serves to prune the search space more efficiently as the reduced dimensions can better exploit the small cache line size. Moreover, the distance computation on lower dimensionality is less expensive. We also propose an extension, called Δ+-tree, that globally clusters the data space and then further partitions clusters into small regions to reduce the search space. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of datasets. Our results show that the Δ+-tree is superior in most cases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Corel Image Features. available from http://kdd.ics.uci.edu.
2
3
4
 
5
6
 
7
Y. S. Chen, Y. P. Hung, and C. S. Fuh. Fast algorithm for nearest neighbor search based on a lower bound tree. In Proc. 8th ICCV Conference, pages 446--453, 2001.
 
8
 
9
R. Enbody. Perfmon: Performance Monitoring Tool. available from http://www.cps.msu.edu/ enbody/perfmon.html, 1999.
 
10
G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1989.
 
11
J. Hui, B. C. Ooi, H. Shen, C. Yu, and A. Zhou. An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In Proc. 19th ICDE Conference, 2003.
 
12
I. T. Jolliffie. Principal Component Analysis. Springer-Verlag, 1986.
13
 
14
15
 
16
 
17
C. Yu. High-dimensional indexing. Lecture Notes in Computer Science 2341. Springer-Verlag, 2002.
 
18

CITED BY  10
 
 
 
 
 
 

Collaborative Colleagues:
Bin Cui: colleagues
Beng Chin Ooi: colleagues
Jianwen Su: colleagues
Kian-Lee Tan: colleagues

Peer to Peer - Readers of this Article have also read: