|
ABSTRACT
High dimensional data sets are encountered in many modern database applications. The usual approach is to construct a summary of the data set through a lossy compression technique, and use this lower dimensional synopsis to provide fast, approximate answers to the queries. In this paper, we develop a novel dimensionality reduction technique based on partitioning the high dimensional vector space into orthogonal subspaces. First, we find a relation between the Euclidian distance of two n-dimensional vectors and the Euclidian distances of their projections on the orthogonal subspaces. Then, based on this relation we develop a method to approximate the Euclidian distance using novel inner product approximation. This process allows us to incorporate the shape information of the vectors to this approximation. While the inner product approximation is symmetric, i.e., captures only the magnitude information of the data, the proposed method takes both the magnitude and shape information of the original vectors into account through partitioning. In the experiments, we demonstrate the effectiveness of our technique by comparing it with commonly used methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Anurag Acharya , Mustafa Uysal , Joel Saltz, Active disks: programming model, algorithms and evaluation, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.81-91, October 02-07, 1998, San Jose, California, United States
|
| |
2
|
|
| |
3
|
A. Baruffolo. R-trees for astronomical data indexing. ASP Conf. Ser., Astronomical Data Analysis Software and Systems VIII , 172:375, 1999.
|
 |
4
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
5
|
|
 |
6
|
|
 |
7
|
Phil Bernstein , Michael Brodie , Stefano Ceri , David DeWitt , Mike Franklin , Hector Garcia-Molina , Jim Gray , Jerry Held , Joe Hellerstein , H. V. Jagadish , Michael Lesk , Dave Maier , Jeff Naughton , Hamid Pirahesh , Mike Stonebraker , Jeff Ullman, The Asilomar report on database research, ACM SIGMOD Record, v.27 n.4, p.74-80, Dec. 1998
[doi> 10.1145/306101.306137]
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Launder, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.
|
| |
12
|
|
| |
13
|
S. T. Dumais. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers, 23:229--236, 1991.
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
C. Faloutsos , R. Barber , M. Flickner , J. Hafner , W. Niblack , D. Petkovic , W. Equitz, Efficient and effective querying by image content, Journal of Intelligent Information Systems, v.3 n.3-4, p.231-262, July 1994
[doi> 10.1007/BF00962238]
|
 |
18
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
Informix. http://www.ibm.com/software/data/informix/blades/spatial/rtree.html, 2002.
|
| |
24
|
|
| |
25
|
T. Kailath. Modern Signal Processing. Springer Verlag, 1985.
|
 |
26
|
K. V. Ravi Kanth , Divyakant Agrawal , Ambuj Singh, Dimensionality reduction for similarity searching in dynamic databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.166-176, June 01-04, 1998, Seattle, Washington, United States
|
 |
27
|
|
| |
28
|
H. Karhunen. Uber lineare methoden in der wahrscheinlich-keitsrechnung. Ann. Acad. Science Fenn , 1947.
|
 |
29
|
|
 |
30
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
| |
31
|
|
| |
32
|
|
| |
33
|
M. Loeve. Fonctions aleatoires de seconde ordre. Processus Stochastiques et Mouvement Brownien , 1948.
|
| |
34
|
W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker. The QBIC project: Querying images by content using color, texture and shape. In Proc. of the SPIE Conf. 1908 on Storage and Retrieval for Image and Video Databases , volume 1908, pages 173--187, February 1993.
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
 |
40
|
Michail Vlachos , Carlotta Domeniconi , Dimitrios Gunopulos , George Kollios , Nick Koudas, Non-linear dimensionality reduction techniques for classification and visualization, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775143]
|
| |
41
|
|
| |
42
|
A. J. Wicenec and M. Albrecht. Methods for structuring and searching very large catalogs. ASP Conf. Ser., Astronomical Data Analysis Software and Systems VII, 145:512, 1998.
|
 |
43
|
Daniel Wu , Ambuj Singh , Divyakant Agrawal , Amr El Abbadi , Terence R. Smith, Efficient retrieval for browsing large image databases, Proceedings of the fifth international conference on Information and knowledge management, p.11-18, November 12-16, 1996, Rockville, Maryland, United States
[doi> 10.1145/238355.238365]
|
|