|
ABSTRACT
In this article, we present an efficient B+-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B+-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
2
|
Sunil Arya , David M. Mount , Nathan S. Netanyahu , Ruth Silverman , Angela Wu, An optimal algorithm for approximate nearest neighbor searching, Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, p.573-582, January 23-25, 1994, Arlington, Virginia, United States
|
 |
3
|
|
| |
4
|
|
 |
5
|
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegal, The pyramid-technique: towards breaking the curse of dimensionality, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.142-153, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
Christos Faloutsos , King-Ip Lin, FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.163-174, May 22-25, 1995, San Jose, California, United States
|
| |
17
|
|
| |
18
|
|
 |
19
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
 |
20
|
|
| |
21
|
Jagadish, H., Ooi, B. C., Tan, K.-L., Yu, C., and Zhang, R. 2004. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. Tech. Rep. www.comp.nus.edu.sg/~ooibc, National University of Singapore.
|
| |
22
|
Jolliffe, I. T. 1986. Principle Component Analysis. Springer-Verlag.
|
 |
23
|
|
| |
24
|
Koudas, N., Ooi, B. C., Tan, K.-L., and Zhang, R. 2004. Approximate NN queries on streams with guaranteed error/performance bounds. In Proceedings of the International Conference on Very Large Data Bases. 804--815.
|
| |
25
|
Kruskal, J. B. 1956. On the shortest spanning subtree of a graph and the travelling salesman problem. In Proceedings of the American Mathematical Society 7, 48--50.
|
| |
26
|
|
| |
27
|
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symposium on Mathematical statistics and probability. University of California Press, 281--297.
|
 |
28
|
Beng Chin Ooi , Kian-Lee Tan , Cui Yu , Stephane Bressan, Indexing the edges—a simple and yet efficient approach to high-dimensional indexing, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.166-174, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335219]
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
 |
37
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 30
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Junqi Zhang , Xiangdong Zhou , Wei Wang , Baile Shi , Jian Pei, Using high dimensional indexes to support relevance feedback based interactive images retrieval, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
Biswanath Panda , Mirek Riedewald , Stephen B. Pope , Johannes Gehrke , L. Paul Chew, Indexing for function approximation, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
Xiangmin Zhou , Xiaofang Zhou , Heng Tao Shen, Efficient similarity search by summarization in large video database, Proceedings of the eighteenth conference on Australasian database, p.161-167, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi Zhuang , Yueting Zhuang , Qing Li , Lei Chen , Yi Yu, Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
Ke Deng , Xiaofang Zhou , Heng Tao Shen , Qing Liu , Kai Xu , Xuemin Lin, A multi-resolution surface distance model for k-NN query processing, The VLDB Journal — The International Journal on Very Large Data Bases, v.17 n.5, p.1101-1119, August 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zi Huang , Heng Tao Shen , Jie Shao , Stefan Rüger , Xiaofang Zhou, Locality condensation: a new dimensionality reduction method for image retrieval, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Yufei Tao , Ke Yi , Cheng Sheng , Panos Kalnis, Quality and efficiency in high dimensional nearest neighbor search, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|