|
ABSTRACT
A scheme to answer best-match queries from a file containing a collection of objects is described. A best-match query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target.
Previous work [5, 331] suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Alfred V. Aho , John E. Hopcroft , Jeffrey Ullman , J. D. Ullman , J. E. Hopcroft, Data Structures and Algorithms, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1983
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
CL~,US, V., EHRIG, M., AND ROZENBERG, G. Graph-Grammars and Their Application to Computer Science and Biology. Springer, New York, 1979.
|
| |
7
|
Du, H. C., AND LEE, R. C.W. Symbolic Gray code as a multikey hashing function. IEEE Trans. Pattern Anal. Mach. InteU. 2, 1 ( Jan. 1980), 83-90.
|
| |
8
|
DUDA, R. O., AND HART, P.E. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
|
 |
9
|
|
| |
10
|
EASTMAN, C. M., AND WEISS, S.F. Tree structures for high dimensionality nearest neighbor searching. Inf. Syst. 7, 2 (1982), 115-122.
|
| |
11
|
EASTMAN, C. M., AND ZEMANKOVA, M. Partially specified nearest neighbor searches using k - d trees. Inf. Process. Lett. 15, 2 (1982), 53-56.
|
| |
12
|
FEUSTEL, C. D., AND SHAPIRO, L.G. The nearest neighbor problem in an abstract metric space. Pattern Recognition Lett. 1, 2 (1982), 125-128.
|
 |
13
|
|
 |
14
|
|
| |
15
|
FUKUNAGA, K., AND NARENDRA, P.M. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 7 (July 1975), 750-753.
|
| |
16
|
HOLLAAR, L.A. The Utah text retrieval project. Inf. Technol. Res. Dev. 2 (1983), 155-167.
|
 |
17
|
|
| |
18
|
LANDER, E., MESIROV, J. P., AND WASHINGTON, T. Protein sequence comparison on a data parallel computer. In Proceedings of the IEEE 1988 International Conference on Parallel Processing (Aug. 1988). IEEE, New York, 1988, 257-263.
|
| |
19
|
LIPMAN, D. J., AND PEARSON, W.R. Rapid and sensitive protein similarity searches. Science 227 (1985), 1435-1441.
|
| |
20
|
|
| |
21
|
|
| |
22
|
MURTAGH, F. A very fast exact nearest neighbor algorithm for use in information retrieval. Inf. Technol. Res. Dev. I (1982), 275-283.
|
| |
23
|
MURTAGH, F. Expected-time complexity results for hierarchic clustering algorithms which use cluster centers. Inf. Process. Lett. 16, 5 ( June 1983), 237-241.
|
| |
24
|
MURTAGH, F. A survey of recent advances in hierarchical clustering algorithms. IEEE Computer 26, 4 (1983), 354-359.
|
| |
25
|
MURTAGH, F. Multidimensional clustering algorithms. In Lectures in Computational Statistics, J. M. Chambers, J. Gordesch, A. Klas, L. Lebart, and P. P. Sint, Eds., Physica-Verlag, Vienna, 1985.
|
| |
26
|
PA{GE, R. C., AND Kr~USKAL, C.P. Parallel algorithms for shortest path problems. In Proceedings of the IEEE 1985 International Conference on Parallel Processing (1985). IEEE, New York, 1985, 14-19.
|
| |
27
|
PERRY, S. A., AND WILLETT, P. A review of the use of inverted files for best match searching in information retrieval systems. J. Inf. Sci. 6 (1983), 59-66.
|
| |
28
|
POOUE, C. A., AND WILLETT, P. An evaluation of document retrieval from serial files using the ICL Distributed Array Processor. Online Rev. 8 (1984), 569-584.
|
| |
29
|
ROHLF, F.J. A probabilistic minimum spanning tree algorithm, inf. Process. Lett. 7 {1978), 44-48.
|
| |
30
|
|
| |
31
|
SHAMOS, M. I., AND HOE~, D. Closest-point problems. In Proceedings of the 16th IEEE Symposium on Foundations of Computer Science (Oct. 1975). IEEE, New York, 1975, 151-162.
|
| |
32
|
SHAPIRO, B. A., AND ZHANG, K. Comparing multiple RNA secondary structures using tree comparisons. Manuscript, Division of Cancer Biology and Diagnosis, NIH, Frederick, Md., 1989.
|
 |
33
|
|
| |
34
|
SHASHA, D., AND WANG, T.-L. Optimal best-match retrieval. Tech. Rep. TR 480, Courant Institute of Mathematical Sciences, New York Univ., New York, Dec. 1989.
|
 |
35
|
|
| |
36
|
|
| |
37
|
TESKEY, F.N. Novel computer architectures for data storage and retrieval. Re/). 5845, British Library Research and Development Dept., London, 1986.
|
 |
38
|
|
| |
39
|
|
| |
40
|
|
 |
41
|
|
 |
42
|
|
CITED BY 19
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason Tsong-Li Wang , Gung-Wei Chirn , Thomas G. Marr , Bruce Shapiro , Dennis Shasha , Kaizhong Zhang, Combinatorial pattern discovery for scientific data: some preliminary results, ACM SIGMOD Record, v.23 n.2, p.115-125, June 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shian-Hua Lin , Chi-Sheng Shih , Meng Chang Chen , Jan-Ming Ho , Ming-Tat Ko , Yueh-Ming Huang, Extracting classification knowledge of Internet documents with mining term associations: a semantic approach, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.241-249, August 24-28, 1998, Melbourne, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Caroline Merriam Eastman : Reviewer"
The best-match problem involves finding the objects in a file that
are closest to some specified object. Two major issues are the choice of
an appropriate measure of closeness and efficient implementation. This
paper addresses the
more...
|