|
ABSTRACT
One of the common queries in many database applications is finding approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance-based index structures are proposed for applications where the distance computations between objects of the data domain are expensive (such as high-dimensional data) and the distance function is metric. In this paper we consider using distance-based index structures for similarity queries on large metric spaces. We elaborate on the approach that uses reference points (vantage points) to partition the data space into spherical shell-like regions in a hierarchical manner. We introduce the multivantage point tree structure (mvp-tree) that uses more than one vantage point to partiton the space into spherical cuts at each level. In answering similarity-based queries, the mvp-tree also utilizes the precomputed (at construction time) distances between the data points and the vantage points. We summarize the experiments comparing mvp-trees to vp-trees that have a similar partitioning strategy, but use only one vantage point at each level and do not make use of the precomputed distances. Empirical studies show that the mvp-tree outperforms the vp-tree by 20% to 80% for varying query ranges and different distance distributions. Next, we generalize the idea of using multiple vantage points and discuss the results of experiments we have made to see how varying the number of vantage points in a node affects affects performance and how much is gained in performance by making use of precomputed distances. The results show that, after all, it may be best to use a large number of vantage points in an internal node in order to end up with a single directory node and keep as many of the precomputed distances as possible to provide more efficient filtering during search operations. Finally, we provide some experimental results that compare mvp-trees with M-trees, which is a dynamic distance-based index structure for metric domains.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
4
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
CIACCIA,P.AND PATELLA, M. 1998a. Bulk loading the M-tree. In Proceedings of the Australasian Conference on Databases (ADC, Perth, Australia)
|
| |
10
|
|
| |
11
|
|
 |
12
|
Paolo Ciaccia , Marco Patella , Pavel Zezula, A cost model for similarity queries in metric spaces, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.59-68, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275495]
|
| |
13
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
 |
14
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
15
|
|
| |
16
|
|
| |
17
|
OTTERMAN, M. 1992. Approximate matching with high dimensionality R-trees. Department of Computer Science, University of Maryland, College Park, MD.
|
 |
18
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
UHLMANN, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4 (Nov.), 175-179.
|
| |
23
|
|
| |
24
|
|
CITED BY 30
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Caetano Traina, Jr. , Roberto F. Filho , Agma J. Traina , Marcos R. Vieira , Christos Faloutsos, The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.4, p.483-505, October 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tsuyoshi Takayama , Yuuta Hashiba , Shigeru Kikuchi , Tetsuo Ikeda , Yoshitoshi Murata, Axes rectifying of impression space in music impression-based retrieval and its evaluation, Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, p.86-89, February 16-19, 2007, Corfu Island, Greece
|
|
|
|
|
|
|
|
|
Pedro H. Bugatti , Agma J. M. Traina , Caetano Traina, Jr., Assessing the best integration between distance-function and image-feature to answer similarity queries, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, A metric cache for similarity search, Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, Caching content-based queries for robust and efficient image retrieval, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
Bin Wang , Xiao-Chun Yang , Guo-Ren Wang , Ge Yu , Lei Chen , X. Sean Wang , Xue-Min Lin, Continually answering constraint k-NN queries in unstructured P2P systems, Journal of Computer Science and Technology, v.23 n.4, p.538-556, July 2008
|
|
|
|
|
|
|
REVIEW
"Yannis P. Manolopoulos : Reviewer"
A method generalizing the existing vp-trees for indexing in metric
spaces is presented. The superiority of the new method over vp-trees,
and over another dynamic indexing method—M-trees—is examined
experimentally.
The p
more...
|