|
ABSTRACT
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the 'Query By Example' type (which translates to a range query); the 'all pairs' query (which translates to a spatial join [8]); the nearest-neighbor or best-match query, etc.However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS) [51]; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410, 1990.
|
| |
5
|
Manish Arya, William Cody, Christos FaIoutsos, Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.
|
| |
6
|
|
 |
7
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
8
|
Thomas Brinkhoff , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, Multi-step processing of spatial joins, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.197-208, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
9
|
Thomas Brinkhoff , Hans-Peter Kriegel , Bernhard Seeger, Efficient processing of spatial joins using R-trees, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.237-246, May 25-28, 1993, Washington, D.C., United States
|
 |
10
|
|
| |
11
|
Mathematical Committee on Physical and NSF Engineering Sciences. Grand Challenges" High Performance Computing and Communications. National Science Foundation, 1992. The FY 1992 U.S. Research and Development Program.
|
| |
12
|
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
|
| |
13
|
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105-115, Gaithersburg, MD, March 1994. NIST. Special Publication 500- 215.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, second edition, 1989.
|
 |
20
|
|
| |
21
|
|
| |
22
|
K. Hinrichs and J. Nievergelt. The grid file: a data structure to support proximity queries on spatial objects. Proc. of the WG'83 (Intern. Workshop on Graph Theoretic Concepts in Computer Science), pages 100-113, 1983.
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
Mark A. Jones, Guy A. Story, and Bruce W. Ballard. Integrating multiple knowledge sources in a bayesian ocr post-processor. In First International Conference on Document Analysis and Recognition, Saint-Malo, France, September 1991. to appear.
|
| |
27
|
|
| |
28
|
Joseph B. Kruskal. Nonmetric multidimensional scaling. Psychometrika, 29"1-27, 1964.
|
| |
29
|
Joseph B. Kruskal and Myron Wish. Multidimensional scaling. SAGE publications, Beverly Hills, 1978.
|
 |
30
|
|
 |
31
|
|
| |
32
|
F. Murtagh. A survey of recent advances in hierarchical clustering Mgorithms. The Computer Journal, 26(4):354-359, 1983.
|
| |
33
|
|
| |
34
|
|
| |
35
|
Wayne Niblack, Ron Barber, Will Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. The QBIC project: Querying images by content using color, texture and shape. SPIE 1993 Intl. Symposium on Electronic Imaging: Science and Technology, Conf. 1908, Storage and Retrieval for Image and Video Databases, February 1993. Also available as IBM Research Report RJ 9203 (81511), Feb. 1, 1993, Computer Science.
|
 |
36
|
|
 |
37
|
|
 |
38
|
|
| |
39
|
William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.
|
| |
40
|
A. Ravishankar Rao and Jerry Lohse. Identifying high level features of texture perception. In SPIE Conference, San Jose, February 1992.
|
| |
41
|
A. Kimball Romney, Roger N. Shepa~d, and Sara Beth Nel:love. Multidimensional scaling: Theory and applications in the behavioral sciences : vol II- Applications. Seminar Press, New York, 1972.
|
 |
42
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
43
|
|
| |
44
|
David Sankoff and Joseph B. Kruskal. Time Warps, String Edits and Macromolecules: the Theory and Practice o} Sequence Comparisons. Addison-Wesley Publishing Company, Inc., Reading, MA, 1983.
|
| |
45
|
|
 |
46
|
|
 |
47
|
|
| |
48
|
R. N. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance i and ii. Psychometrika, 27:125-140, 219-246, 1962.
|
| |
49
|
Gilbert Strang. Linear Algebra and its Applications. Academic Press, 1980. 2nd edition.
|
| |
50
|
A.W. Toga, P.K. Banerjee, and E.M. Santori. Warping 3d models for interbrain comparisons. Neurosc. A bs., 16:247, 1990.
|
| |
51
|
W. S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17:401-419, 1952.
|
| |
52
|
|
| |
53
|
Dimitris Vassiliadis. The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables. Int. Workshop on Applications o/ Artificial Intelligence in Solar Terrestrial Physics, September 1993.
|
| |
54
|
Stephen Wolfram. Mathematica. Addison Wesley, 1991. Second Edition.
|
| |
55
|
Forrest W. Young. Multidimensional scaling : Hzstory, Theory and Applications. Lawrence Erlbaum associates, Hillsdale, New Jersey, 1987.
|
CITED BY 138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chung-Sheng Li , Philip S. Yu , Vittorio Castelli, MALM: a framework for mining sequence database at multiple abstraction levels, Proceedings of the seventh international conference on Information and knowledge management, p.267-272, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
Vikrant Kobla , David Doermann , Christos Faloutsos, VideoTrails: representing and visualizing structure in video sequences, Proceedings of the fifth ACM international conference on Multimedia, p.335-346, November 09-13, 1997, Seattle, Washington, United States
|
|
|
Tzi-cker Chiueh , Allen Ballman , Kevin Kreeger, Multi-resolution indexing for shape images, Proceedings of the seventh international conference on Information and knowledge management, p.297-305, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
Jason Tsong-Li Wang , Xiong Wang , King-Ip Lin , Dennis Shasha , Bruce A. Shapiro , Kaizhong Zhang, Evaluating a class of distance-mapping algorithms for data mining and clustering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.307-311, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
Kristin P. Bennett , Usama Fayyad , Dan Geiger, Density-based indexing for approximate nearest-neighbor queries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.233-243, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michail Vlachos , Carlotta Domeniconi , Dimitrios Gunopulos , George Kollios , Nick Koudas, Non-linear dimensionality reduction techniques for classification and visualization, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christian Böhm , Bernhard Braunmüller , Markus Breunig , Hans-Peter Kriegel, High performance clustering based on the similarity join, Proceedings of the ninth international conference on Information and knowledge management, p.298-305, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V. Anupam , S. Dar , T. Leibfried , E. Petajan, Research report: DataSpace: 3-D visualizations of large databases, Proceedings of the 1995 IEEE Symposium on Information Visualization, p.82, October 30-31, 1995, Atlanta, Georgia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Banu Özden , Rajeev Rastogi , Avi Silberschatz, Multimedia support for databases, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.1-11, May 11-15, 1997, Tucson, Arizona, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
O. D. Sahin , A. Gulbeden , F. Emekci , D. Agrawal , A. El Abbadi, PRISM: indexing multi-dimensional data in P2P networks using reference vectors, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gautam Das , Vagelis Hristidis , Nishant Kapoor , S. Sudarshan, Ordering the attributes of query results, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gang Qian , Qiang Zhu , Qiang Xue , Sakti Pramanik, The ND-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces, Proceedings of the 29th international conference on Very large data bases, p.620-631, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Stefano Lonardi , Chotirat Ann Ratanamahatana , Li Wei , Sang-Hee Lee , John Handley, Compression-based data mining of sequential data, Data Mining and Knowledge Discovery, v.14 n.1, p.99-129, February 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vassilis Athitsos , Panagiotis Papapetrou , Michalis Potamias , George Kollios , Dimitrios Gunopulos, Approximate embedding-based subsequence matching of time series, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
Reinier H. van Leuken , M. Fatih Demirci , Victoria J. Hodge , Jim Austin , Remco C. Veltkamp, Layout indexing of trademark images, Proceedings of the 6th ACM international conference on Image and video retrieval, p.525-532, July 09-11, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|