|
ABSTRACT
For an increasing number of modern database applications, efficient support of similarity search becomes an important task. Along with the complexity of the objects such as images, molecules and mechanical parts, also the complexity of the similarity models increases more and more. Whereas algorithms that are directly based on indexes work well for simple medium-dimensional similarity distance functions, they do not meet the efficiency requirements of complex high-dimensional and adaptable distance functions. The use of a multi-step query processing strategy is recommended in these cases, and our investigations substantiate that the number of candidates which are produced in the filter step and exactly evaluated in the refinement step is a fundamental efficiency parameter. After revealing the strong performance shortcomings of the state-of-the-art algorithm for k-nearest neighbor search [Korn et al. 1996], we present a novel multi-step algorithm which is guaranteed to produce the minimum number of candidates. Experimental evaluations demonstrate the significant performance gain over the previous solution, and we observed average improvement factors of up to 120 for the number of candidates and up to 48 for the total runtime.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AFS 93
|
|
| |
AKS 98
|
Ankerst M., Kriegel H.-P., Seidl T.: 'Pixel-based Shape Similarity Search in Large Image Databases', submitted for publication.
|
| |
ALSS 95
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
 |
AMN 95
|
Sunil Arya , David M. Mount , Onuttom Narayan, Accounting for boundary effects in nearest neighbor searching, Proceedings of the eleventh annual symposium on Computational geometry, p.336-344, June 05-07, 1995, Vancouver, British Columbia, Canada
[doi> 10.1145/220279.220315]
|
 |
BBKK 97
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
 |
Ber+97
|
Stefan Berchtold , Christian Böhm , Bernhard Braunmüller , Daniel A. Keim , Hans-Peter Kriegel, Fast parallel similarity search in multimedia databases, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.1-12, May 11-15, 1997, Tucson, Arizona, United States
|
| |
Ber+98
|
|
| |
BHKS 93
|
|
| |
BKK 96
|
|
 |
BK 97
|
|
| |
BMH 92
|
|
| |
CPZ 97
|
|
| |
Fal+ 94
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
 |
FBF 77
|
|
 |
FL 95
|
Christos Faloutsos , King-Ip Lin, FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.163-174, May 22-25, 1995, San Jose, California, United States
|
 |
FRM 94
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
GG 97
|
|
| |
GM 93
|
|
| |
Haf+95
|
|
| |
Hen 94
|
Henrich, A.: 'A Distance-Scan Algorithm for Spatial Access Structures', Proc. 2nd ACM Workshop on Advances in Geographic Information Systems, Gaithersburg, Maryland, 1994, pp. 136-143.
|
| |
HS 95
|
|
 |
Jag 91
|
|
| |
Kor+96
|
|
| |
KS 98
|
|
| |
KSS 97
|
|
| |
OM 88
|
|
| |
PM 97
|
|
| |
PS 93
|
|
 |
RKV 95
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
RP 92
|
Ramasubramanian V., Paliwal K. K.: 'Fast k-Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.
|
| |
Sei 97
|
Seidl T.: 'Adaptable Similarity Search in 3-D Spatial Database Systems', PhD thesis, Institute for Computer Science, University of Munich, 1997.
|
| |
SK 97
|
|
| |
Spr 91
|
Sproull R.F.: "Refinements to Nearest Neighbor Searching in k-Dimensional Trees', Algorithmica 1991, pp. 579-589.
|
| |
WJ 96
|
|
CITED BY 94
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kristin P. Bennett , Usama Fayyad , Dan Geiger, Density-based indexing for approximate nearest-neighbor queries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.233-243, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans-Peter Kriegel , Stefan Brecheisen , Peer Kröger , Martin Pfeifle , Matthias Schubert, Using sets of feature vectors for similarity search on voxelized CAD objects, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Caetano Traina, Jr. , Agma J. M. Traina , Marcos R. Vieira , Adriano S. Arantes , Christos Faloutsos, Efficient processing of complex similarity queries in RDBMS through query rewriting, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Khanh Vu , Kien A. Hua , Hao Cheng , Sheau-Dong Lang, A non-linear dimensionality-reduction technique for fast similarity search in large databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans-Peter Kriegel , Peer Kröger , Peter Kunath , Matthias Renz , Tim Schmidt, Proximity queries in large traffic networks, Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, November 07-09, 2007, Seattle, Washington
|
|
|
|
|
|
Ke Deng , Xiaofang Zhou , Heng Tao Shen , Qing Liu , Kai Xu , Xuemin Lin, A multi-resolution surface distance model for k-NN query processing, The VLDB Journal — The International Journal on Very Large Data Bases, v.17 n.5, p.1101-1119, August 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Humberto L. Razente , Maria Camila N. Barioni , Agma Juci M. Traina , Christos Faloutsos , Caetano Traina, Jr., A novel optimization approach to efficiently process aggregate similarity queries in metric access methods, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Bin Wang , Xiao-Chun Yang , Guo-Ren Wang , Ge Yu , Lei Chen , X. Sean Wang , Xue-Min Lin, Continually answering constraint k-NN queries in unstructured P2P systems, Journal of Computer Science and Technology, v.23 n.4, p.538-556, July 2008
|
|
|
|
|
|
|
|
|
Thomas Seidl , Ira Assent , Philipp Kranen , Ralph Krieger , Jennifer Herrmann, Indexing density models for incremental learning and anytime classification on data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
Kefeng Xuan , Geng Zhao , David Taniar , Bala Srinivasan , Maytham Safar , Marina Gavrilova, Continuous range search based on network Voronoi diagram, International Journal of Grid and Utility Computing, v.1 n.4, p.328-335, August 2009
|
|