|
ABSTRACT
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are static, that is, few of them tolerate insertion or deletion of elements at reasonable cost over an existing index. The spatial approximation tree (sa--tree) has been experimentally shown to provide a good tradeoff between construction cost, search cost, and space requirement. However, the sa--tree is static, which renders it unsuitable for many database applications. In this paper, we study different methods to handle insertions and deletions on the sa--tree at low cost. In many cases, the dynamic construction (by successive insertions) is even faster than the previous static construction, and both are similar elsewhere. In addition, the dynamic version significantly improves the search performance of sa--trees in virtually all cases. The result is a much more practical data structure that can be useful in a wide range of database applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apers, P., Blanken, H., and Houtsma, M. 1997. Multimedia Databases in Perspective. Springer, New York.
|
| |
2
|
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA.
|
| |
3
|
Baeza-Yates, R., Cunto, W., Manber, U., and Wu, S. 1994. Proximity matching using fixed-queries trees. In Proc. 5th Combinatorial Pattern Matching (CPM'94). LNCS 807. 198--212.
|
| |
4
|
Bentley, J. and Saxe, J. 1980. Decomposable searching problems i: Static-to-dynamic transformation. J. Algorithms 1, 4, 301--358.
|
| |
5
|
Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33, 3 (Sept.), 322--373.
|
| |
6
|
Bozkaya, T. and Ozsoyoglu, M. 1997. Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (ACM SIGMOD'97). 357--368. Sigmod Record 26(2).
|
| |
7
|
Brin, S. 1995. Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB'95). 574--584.
|
| |
8
|
Burkhard, W. and Keller, R. 1973. Some approaches to best-match file searching. Comm. of the ACM 16, 4, 230--236.
|
| |
9
|
Bustos, B. and Navarro, G. 2002. Probabilistic proximity searching algorithms based on compact partitions. In Proc. 9th String Processing and Information Retrieval (SPIRE'02). LNCS 2476. 284--297.
|
| |
10
|
Cantone, D., Ferro, A., Pulvirenti, A., Recupero, D. R., and Shasha, D. 2005. Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces. IEEE Transactions on Knowledge and Data Engineering 17, 4, 535--550.
|
| |
11
|
Chávez, E. and Navarro, G. 2001. A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX'01). LNCS 2153. 147--160.
|
| |
12
|
Chávez, E. and Navarro, G. 2005. A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26, 9, 1363--1376.
|
| |
13
|
Chávez, E., Marroquín, J., and Baeza-Yates, R. 1999. Spaghettis: An array based algorithm for similarity queries in metric spaces. In Proc. 6th International Symposium on String Processing and Information Retrieval (SPIRE'99). IEEE CS Press, Los Alamitos, CA. 38--46.
|
| |
14
|
Chávez, E., Marroquín, J., and Navarro, G. 2001a. Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications 14, 2, 113--135.
|
| |
15
|
Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. 2001b. Searching in metric spaces. ACM Computing Surveys 33, 3 (Sept.), 273--321.
|
| |
16
|
Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of the 23rd Conference on Very Large Databases (VLDB'97). 426--435.
|
| |
17
|
Clarkson, K. 1999. Nearest neighbor queries in metric spaces. Discrete & Computational Geometry 22, 1, 63--93.
|
| |
18
|
Dehne, F. and Noltemeier, H. 1987. Voronoi trees and clustering problems. Information Systems 12, 2, 171--175.
|
| |
19
|
Dohnal, V. 2004. An access structure for similarity search in metric spaces. In EDBT Workshops. 133--143.
|
| |
20
|
Dohnal, V., Gennaro, C., Savino, P., and Zezula, P. 2003. D-index: Distance searching index for metric data sets. Multimedia Tools and Applications 21, 1, 9--33.
|
| |
21
|
Duda, R. and Hart, P. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
|
| |
22
|
Gaede, V. and Günther, O. 1998. Multidimensional access methods. ACM Computing Surveys 30, 2, 170--231.
|
| |
23
|
Galperin, I. and Rivest, R. 1993. Scapegoat trees. In SODA '93: Proceedings of the fourth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA. 165--174.
|
| |
24
|
Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge.
|
| |
25
|
Hjaltason, G. and Samet, H. 2000. Incremental similarity search in multimedia databases. Tech. Rep. CS-TR-4199, University of Maryland, Computer Science Department.
|
| |
26
|
Hjaltason, G. and Samet, H. 2003a. Improved search heuristics for the sa-tree. Pattern Recognition Letters 24, 15, 2785--2795.
|
| |
27
|
Hjaltason, G. and Samet, H. 2003b. Index-driven similarity search in metric spaces. ACM Trans. on Database Systems 28, 4, 517--580.
|
| |
28
|
Karger, D. and Ruhl, M. 2002. Finding nearest neighbors in growth-restricted metrics. In STOC'02: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing. ACM Press, New York. 741--750.
|
| |
29
|
Krauthgamer, R. and Lee, J. 2004. Navigating nets: simple algorithms for proximity search. In SODA. 798--807.
|
| |
30
|
Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys 24, 4, 377--439.
|
| |
31
|
Micó, L., Oncina, J., and Vidal, E. 1994. A new version of the nearest-neighbor approximating and eliminating search (AESA) with linear preprocessing-time and memory requirements. Pattern Recognition Letters 15, 9--17.
|
| |
32
|
Micó, L., Oncina, J., and Carrasco, R. 1996. A fast branch and bound nearest neighbour classifier in metric spaces. Pattern Recognition Letters 17, 731--739.
|
| |
33
|
Navarro, G. 1999. Searching in metric spaces by spatial approximation. In Proc. String Processing and Information Retrieval (SPIRE'99). IEEE CS Press, Los Alamitos, CA. 141--148.
|
| |
34
|
Navarro, G. 2002. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ) 11, 1, 28--46.
|
| |
35
|
Navarro, G. and Reyes, N. 2001. Dynamic spatial approximation trees. In Proc. XXI Conference of the Chilean Computer Science Society (SCCC'01). IEEE CS Press, Los Alamitos, CA. 213--222.
|
| |
36
|
Navarro, G. and Reyes, N. 2002a. Fully dynamic spatial approximation trees. In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE 2002). LNCS 2476. Springer, New York. 254--270.
|
| |
37
|
Navarro, G. and Reyes, N. 2002b. Improved dynamic spatial approximation trees. In Proceedings of the XXVIII Latin American Conference on Informatics (CLEI'02). Montevideo, Uruguay, 74. Abstract in print, complete papers in CD-ROM.
|
| |
38
|
Navarro, G. and Reyes, N. 2003. Improved deletions in dynamic spatial approximation trees. In Proc. XXIII Conference of the Chilean Computer Science Society (SCCC'03). IEEE CS Press, Los Alamitos, CA. 13--22.
|
| |
39
|
Nene, S. and Nayar, S. 1997. A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 9, 989--1003.
|
| |
40
|
Noltemeier, H., Verbarg, K., and Zirkelbach, C. 1992. Monotonous Bisector* Trees—a tool for efficient partitioning of complex schenes of geometric objects. In Data Structures and Efficient Algorithms. LNCS 594. Springer-Verlag, New York. 186--203.
|
| |
41
|
Salton, G. and McGill, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill.
|
| |
42
|
Samet, H. 2005. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Pub., San Francisco, CA.
|
| |
43
|
Sankoff, D. and Kruskal, J., Eds. 1983. Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.
|
| |
44
|
Skopal, T., Pokorný, J., and Snásel, V. 2004. PM-tree: Pivoting metric tree for similarity search in multimedia databases. In ADBIS (Local Proceedings).
|
| |
45
|
Uhlmann, J. 1991a. Implementing metric trees to satisfy general proximity/similarity queries. In Proc. Command and Control Symposium. Washington, DC. Also published as Code 5570 NRL Memo Report (1991) at the Naval Research Laboratory.
|
| |
46
|
Uhlmann, J. 1991b. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 175--179.
|
| |
47
|
Uribe, R. and Navarro, G. 2003. Una estructura dinámica para búsqueda en espacios métricos. In Actas del XI Encuentro Chileno de Computación, Jornadas Chilenas de Computación. Chillán, Chile. In Spanish. In CD-ROM.
|
| |
48
|
Verbarg, K. 1995. The C-Tree: A dynamically balanced spatial index. Computing Suppl. 10, 323--340.
|
| |
49
|
Vidal, E. 1986. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145--157.
|
| |
50
|
Waterman, M. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall/CRC, Boca Raton, FL.
|
| |
51
|
Yianilos, P. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th ACM-SIAM Symposium on Discrete Algorithms (SODA'93). 311--321.
|
| |
52
|
Yianilos, P. 2000. Locally lifting the curse of dimensionality for nearest neighbor search. In Proc. 11th ACM-SIAM Symposium on Discrete Algorithms (SODA'00). 361--370.
|
| |
53
|
Yoshitaka, A. and Ichikawa, T. 1999. A survey on content-based retrieval for multimedia databases. IEEE Trans. on Knowledge and Data Engineering 11, 1, 81--93.
|
| |
54
|
Zezula, P., Amato, G., Dohnal, V., and Batko, M. 2006. Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, New York.
|
|