ACM Home Page
Please provide us with feedback. Feedback
Searching in metric spaces
Full text PdfPdf (916 KB)
Source ACM Computing Surveys (CSUR) archive
Volume 33 ,  Issue 3  (September 2001) table of contents
Pages: 273 - 321  
Year of Publication: 2001
ISSN:0360-0300
Authors
Edgar Chávez  Escuela de Ciencias Físico-Matemáticas, Universidad Michoacana, Morelia, Mich. México
Gonzalo Navarro  Depto. de Ciencias de la Computación, Universidad de Chile, Santiago, Chile
Ricardo Baeza-Yates  Depto. de Ciencias de la Computación, Universidad de Chile, Santiago, Chile
José Luis Marroquín  Centro de Investigación en Matemáticas (CIMAT), Valenciana, Guanajuato, Gto. Méco
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 72,   Downloads (12 Months): 442,   Citation Count: 97
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502807.502808
What is a DOI?

ABSTRACT

The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
BAEZA-YATES, R. 1997. Searching: An algorithmic tour. In Encyclopedia of Computer Science and Technology, vol. 37, A. Kent and J. Williams, Eds., Marcel Dekker, New York, 331-359.
 
5
BAEZA-YATES,R.AND NAVARRO, G. 1998. Fast approximate string matching in a dictionary. In Proceedings of the Fifth South American Symposium on String Processing and Information Retrieval (SPIRE'98), IEEE Computer Science Press, Los Alamitos, Calif., 14-22.
 
6
 
7
8
 
9
BENTLEY, J. 1979. Multidimensional binary search trees in database applications. IEEE Trans. Softw. Eng. 5, 4, 333-340.
10
 
11
 
12
 
13
14
15
 
16
 
17
BRITO, M., CH~VEZ, E., QUIROZ, A., AND YUKICH,J. 1996. Connectivity of the mutual k-nearest neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35, 33-42.
 
18
BUGNION, E., FHEI, S., ROOS, T., WIDMAYER,P., AND WIDMER, F. 1993. A spatial index for approximate multiple string matching. In Proceedings of the First South American Workshop on String Processing (WSP'93), R. Baeza-Yates and N. Ziviani, Eds., 43-53.
19
 
20
 
21
CH~VEZ, E. 1999. Optimal discretization for pivot based algorithms. Manuscript. ftp:// garota.fismat.umich.mx/pub/users/elchavez/ minimax.ps.gz.
 
22
CH~VEZ,E.AND MARROQUYN, J. 1997. Proximity queries in metric spaces. In Proceedings of the Fourth South American Workshop on String Processing ( WSP'97), R. Baeza-Yates, Ed. Carleton University Press, Ottawa, 21-36.
 
23
 
24
CH~VEZ,E.AND NAVARRO, G. 2001a. Towards measuring the searching complexity of metric spaces. In Proceedings of the Mexican Computing Meeting, vol. II, 969-972.
 
25
 
26
 
27
 
28
29
 
30
 
31
32
 
33
 
34
CLARKSON, K. 1999. Nearest neighbor queries in metric spaces. Discrete Comput. Geom. 22,1, 63-93.
 
35
COX,T.AND COX, M. 1994. Multidimensional Scaling. Chapman and Hall, London.
 
36
 
37
 
38
DUDA,R.AND HART, P. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
39
40
 
41
 
42
 
43
44
45
 
46
 
47
 
48
 
49
KALANTARI,I.AND MCDONALD, G. 1983. A data structure and an algorithm for the nearest point problem. IEEE Trans. Softw. Eng. 9,5.
 
50
 
51
 
52
 
53
 
54
NAVARRO,G.AND REYES, N. 2001. Dynamic spatial approximation trees. In Proceedings of the Eleventh Conference of the Chilean Computer Science Society (SCCC'01), IEEE CS Press, 213-222.
 
55
56
 
57
NOLTEMEIER, H. 1989. Voronoi trees and applications. In Proceedings of the International Workshop on Discrete Algorithms and Complex-ity (Fukuoka, Japan), 69-74.
 
58
59
60
 
61
62
 
63
SANKOFF,D.AND KRUSKAL, J., EDS. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Mass.
64
65
 
66
 
67
UHLMANN, J. 1991a. Implementing metric trees to satisfy general proximity/similarity queries. Manuscript.
 
68
UHLMANN, J. 1991b. Satisfying general proximity/ similarity queries with metric trees. Inf. Proc. Lett. 40, 175-179.
 
69
VERBARG, K. 1995. The C-Ttree: A dynamically balanced spatial index. Comput. Suppl. 10, 323-340.
 
70
 
71
WATERMAN, M. 1995. Introduction to Computational Biology. Chapman and Hall, London.
 
72
 
73
WHITE,D.AND JAIN, R. 1996. Algorithms and strategies for similarity retrieval. Tech. Rep. VCL-96-101 (July), Visual Computing Laboratory, University of California, La Jolla.
 
74
YAO, A. 1990. Computational Geometry,J.Van Leeuwen, Ed. Elsevier Science, New York, 345-380.
 
75
 
76
YIANILOS, P. 1999. Excluded middle vantage point forests for nearest neighbor search. In DIMACS Implementation Challenge, ALENEX'99 (Baltimore, Md).
 
77
 
78

CITED BY  97

Collaborative Colleagues:
Edgar Chávez: colleagues
Gonzalo Navarro: colleagues
Ricardo Baeza-Yates: colleagues
José Luis Marroquín: colleagues