ACM Home Page
Please provide us with feedback. Feedback
New techniques for best-match retrieval
Full text PdfPdf (1.41 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 8 ,  Issue 2  (April 1990) table of contents
Pages: 140 - 158  
Year of Publication: 1990
ISSN:1046-8188
Authors
Dennis Shasha  New York Univ., New York
Tsong-Li Wang  New York Univ., New York
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 49,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/96105.96111
What is a DOI?

ABSTRACT

A scheme to answer best-match queries from a file containing a collection of objects is described. A best-match query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331] suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
5
 
6
CL~,US, V., EHRIG, M., AND ROZENBERG, G. Graph-Grammars and Their Application to Computer Science and Biology. Springer, New York, 1979.
 
7
Du, H. C., AND LEE, R. C.W. Symbolic Gray code as a multikey hashing function. IEEE Trans. Pattern Anal. Mach. InteU. 2, 1 ( Jan. 1980), 83-90.
 
8
DUDA, R. O., AND HART, P.E. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
9
 
10
EASTMAN, C. M., AND WEISS, S.F. Tree structures for high dimensionality nearest neighbor searching. Inf. Syst. 7, 2 (1982), 115-122.
 
11
EASTMAN, C. M., AND ZEMANKOVA, M. Partially specified nearest neighbor searches using k - d trees. Inf. Process. Lett. 15, 2 (1982), 53-56.
 
12
FEUSTEL, C. D., AND SHAPIRO, L.G. The nearest neighbor problem in an abstract metric space. Pattern Recognition Lett. 1, 2 (1982), 125-128.
13
14
 
15
FUKUNAGA, K., AND NARENDRA, P.M. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 7 (July 1975), 750-753.
 
16
HOLLAAR, L.A. The Utah text retrieval project. Inf. Technol. Res. Dev. 2 (1983), 155-167.
17
 
18
LANDER, E., MESIROV, J. P., AND WASHINGTON, T. Protein sequence comparison on a data parallel computer. In Proceedings of the IEEE 1988 International Conference on Parallel Processing (Aug. 1988). IEEE, New York, 1988, 257-263.
 
19
LIPMAN, D. J., AND PEARSON, W.R. Rapid and sensitive protein similarity searches. Science 227 (1985), 1435-1441.
 
20
 
21
 
22
MURTAGH, F. A very fast exact nearest neighbor algorithm for use in information retrieval. Inf. Technol. Res. Dev. I (1982), 275-283.
 
23
MURTAGH, F. Expected-time complexity results for hierarchic clustering algorithms which use cluster centers. Inf. Process. Lett. 16, 5 ( June 1983), 237-241.
 
24
MURTAGH, F. A survey of recent advances in hierarchical clustering algorithms. IEEE Computer 26, 4 (1983), 354-359.
 
25
MURTAGH, F. Multidimensional clustering algorithms. In Lectures in Computational Statistics, J. M. Chambers, J. Gordesch, A. Klas, L. Lebart, and P. P. Sint, Eds., Physica-Verlag, Vienna, 1985.
 
26
PA{GE, R. C., AND Kr~USKAL, C.P. Parallel algorithms for shortest path problems. In Proceedings of the IEEE 1985 International Conference on Parallel Processing (1985). IEEE, New York, 1985, 14-19.
 
27
PERRY, S. A., AND WILLETT, P. A review of the use of inverted files for best match searching in information retrieval systems. J. Inf. Sci. 6 (1983), 59-66.
 
28
POOUE, C. A., AND WILLETT, P. An evaluation of document retrieval from serial files using the ICL Distributed Array Processor. Online Rev. 8 (1984), 569-584.
 
29
ROHLF, F.J. A probabilistic minimum spanning tree algorithm, inf. Process. Lett. 7 {1978), 44-48.
 
30
 
31
SHAMOS, M. I., AND HOE~, D. Closest-point problems. In Proceedings of the 16th IEEE Symposium on Foundations of Computer Science (Oct. 1975). IEEE, New York, 1975, 151-162.
 
32
SHAPIRO, B. A., AND ZHANG, K. Comparing multiple RNA secondary structures using tree comparisons. Manuscript, Division of Cancer Biology and Diagnosis, NIH, Frederick, Md., 1989.
33
 
34
SHASHA, D., AND WANG, T.-L. Optimal best-match retrieval. Tech. Rep. TR 480, Courant Institute of Mathematical Sciences, New York Univ., New York, Dec. 1989.
35
 
36
 
37
TESKEY, F.N. Novel computer architectures for data storage and retrieval. Re/). 5845, British Library Research and Development Dept., London, 1986.
38
 
39
 
40
41
42

CITED BY  19


REVIEW

"Caroline Merriam Eastman : Reviewer"

The best-match problem involves finding the objects in a file that are closest to some specified object. Two major issues are the choice of an appropriate measure of closeness and efficient implementation. This paper addresses the   more...

Collaborative Colleagues:
Dennis Shasha: colleagues
Tsong-Li Wang: colleagues