| Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach |
| Full text |
Pdf
(497 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 261
archive
Proceedings of the 11th international conference on Extending database technology: Advances in database technology
table of contents
Nantes, France
SESSION: Research sessions: Indexing
table of contents
Pages 241-251
Year of Publication: 2008
ISBN:978-1-59593-926-5
|
|
Authors
|
|
Yi Zhuang
|
Zhejiang University, P.R.China
|
|
Yueting Zhuang
|
Zhejiang University, P.R.China
|
|
Qing Li
|
City University of Hong Kong, HKSAR, P.R.China
|
|
Lei Chen
|
Hong Kong University of Science and Technology, HKSAR, P.R.China
|
|
Yi Yu
|
Nara Women's University, Japan
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 65, Citation Count: 0
|
|
|
ABSTRACT
Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegal, The pyramid-technique: towards breaking the curse of dimensionality, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.142-153, June 01-04, 1998, Seattle, Washington, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
UCI KDD Archive, http://www.kdd.ics.uci.edu, 2002.
|
|