|
ABSTRACT
Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AGMM 90
|
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D.J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403-410.
|
| |
Ary 95
|
|
| |
Big 89
|
|
 |
BBKK 97
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
BKK 96
|
|
 |
BKSS 90
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
DS 82
|
|
| |
Fal 94
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
| |
FB 93
|
Faloutsos C., Bhagwat P.: 'Declustering Using Fractals', PDIS Journal of Parallel and Distributed Information Systems, 1993, pp. 18-25.
|
 |
FBF 77
|
|
| |
HS 95
|
|
 |
Jag 91
|
|
 |
Kuk 92
|
|
 |
KP 88
|
|
| |
LJF 94
|
|
| |
MG 93
|
|
| |
MG 95
|
|
| |
PS 85
|
|
 |
RKV 95
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
RP 92
|
Ramasubramanian V., Paliwal K. K.: 'Fast k- Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.
|
| |
SBK 92
|
|
| |
SH 94
|
Shawney H., Hafner J.: "Efficient Color Histogram hzdexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.
|
| |
Wel 71
|
|
| |
WW 80
|
Wallace T., Wintz P.: 'An Efficient Three- Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors ', Computer Graphics and Image Processing, Vol. ! 3, pp. 99-126, 1980
|
CITED BY 51
|
|
|
|
|
|
|
|
Hakan Ferhatosmanoglu , Divyakant Agrawal , Amr El Abbadi, Clustering declustered data for efficient retrieval, Proceedings of the eighth international conference on Information and knowledge management, p.343-350, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alexander Thomasian , Vittorio Castelli , Chung-Sheng Li, Clustering and singular value decomposition for approximate indexing in high dimensional spaces, Proceedings of the seventh international conference on Information and knowledge management, p.201-207, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Schwarz , Markus Iofcea , Matthias Grossmann , Nicola Hönle , Daniela Nicklas , Bernhard Mitschang, On efficiently processing nearest neighbor queries in a loosely coupled set of data sources, Proceedings of the 12th annual ACM international workshop on Geographic information systems, November 12-13, 2004, Washington DC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Caetano Traina, Jr. , Agma J. M. Traina , Marcos R. Vieira , Adriano S. Arantes , Christos Faloutsos, Efficient processing of complex similarity queries in RDBMS through query rewriting, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|