ACM Home Page
Please provide us with feedback. Feedback
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
Full text PdfPdf (1.21 MB)
Source International Conference on Management of Data archive
Proceedings of the 1995 ACM SIGMOD international conference on Management of data table of contents
San Jose, California, United States
Pages: 163 - 174  
Year of Publication: 1995
ISBN:0-89791-731-6
Also published in ...
Authors
Christos Faloutsos  AT&T Bell Laboratories, Murray Hill, NJ
King-Ip Lin  Dept. of Computer Science, Univ. of Maryland, College Park
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 90,   Downloads (12 Months): 460,   Citation Count: 138
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/223784.223812
What is a DOI?

ABSTRACT

A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the 'Query By Example' type (which translates to a range query); the 'all pairs' query (which translates to a spatial join [8]); the nearest-neighbor or best-match query, etc.However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS) [51]; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410, 1990.
 
5
Manish Arya, William Cody, Christos FaIoutsos, Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.
 
6
7
8
9
10
 
11
Mathematical Committee on Physical and NSF Engineering Sciences. Grand Challenges" High Performance Computing and Communications. National Science Foundation, 1992. The FY 1992 U.S. Research and Development Program.
 
12
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
 
13
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105-115, Gaithersburg, MD, March 1994. NIST. Special Publication 500- 215.
14
 
15
16
 
17
18
 
19
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, second edition, 1989.
20
 
21
 
22
K. Hinrichs and J. Nievergelt. The grid file: a data structure to support proximity queries on spatial objects. Proc. of the WG'83 (Intern. Workshop on Graph Theoretic Concepts in Computer Science), pages 100-113, 1983.
23
 
24
25
 
26
Mark A. Jones, Guy A. Story, and Bruce W. Ballard. Integrating multiple knowledge sources in a bayesian ocr post-processor. In First International Conference on Document Analysis and Recognition, Saint-Malo, France, September 1991. to appear.
 
27
 
28
Joseph B. Kruskal. Nonmetric multidimensional scaling. Psychometrika, 29"1-27, 1964.
 
29
Joseph B. Kruskal and Myron Wish. Multidimensional scaling. SAGE publications, Beverly Hills, 1978.
30
31
 
32
F. Murtagh. A survey of recent advances in hierarchical clustering Mgorithms. The Computer Journal, 26(4):354-359, 1983.
 
33
 
34
 
35
Wayne Niblack, Ron Barber, Will Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. The QBIC project: Querying images by content using color, texture and shape. SPIE 1993 Intl. Symposium on Electronic Imaging: Science and Technology, Conf. 1908, Storage and Retrieval for Image and Video Databases, February 1993. Also available as IBM Research Report RJ 9203 (81511), Feb. 1, 1993, Computer Science.
36
37
38
 
39
William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.
 
40
A. Ravishankar Rao and Jerry Lohse. Identifying high level features of texture perception. In SPIE Conference, San Jose, February 1992.
 
41
A. Kimball Romney, Roger N. Shepa~d, and Sara Beth Nel:love. Multidimensional scaling: Theory and applications in the behavioral sciences : vol II- Applications. Seminar Press, New York, 1972.
42
 
43
 
44
David Sankoff and Joseph B. Kruskal. Time Warps, String Edits and Macromolecules: the Theory and Practice o} Sequence Comparisons. Addison-Wesley Publishing Company, Inc., Reading, MA, 1983.
 
45
46
47
 
48
R. N. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance i and ii. Psychometrika, 27:125-140, 219-246, 1962.
 
49
Gilbert Strang. Linear Algebra and its Applications. Academic Press, 1980. 2nd edition.
 
50
A.W. Toga, P.K. Banerjee, and E.M. Santori. Warping 3d models for interbrain comparisons. Neurosc. A bs., 16:247, 1990.
 
51
W. S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17:401-419, 1952.
 
52
 
53
Dimitris Vassiliadis. The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables. Int. Workshop on Applications o/ Artificial Intelligence in Solar Terrestrial Physics, September 1993.
 
54
Stephen Wolfram. Mathematica. Addison Wesley, 1991. Second Edition.
 
55
Forrest W. Young. Multidimensional scaling : Hzstory, Theory and Applications. Lawrence Erlbaum associates, Hillsdale, New Jersey, 1987.

CITED BY  138

Collaborative Colleagues:
Christos Faloutsos: colleagues
King-Ip Lin: colleagues