|
ABSTRACT
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu, A new method for similarity indexing of market basket data, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.407-418, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
M.-W. Berry. Large-scale sparse singular value computations. International Journal of Super-Computer Applications, 6(1):13-49, 1992.
|
| |
6
|
|
| |
7
|
|
| |
8
|
S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA, 1999.
|
| |
9
|
S. Deerwester, S.T. Dumais, G.W. Furnas, and T.K. Landauer. Indexing by latent semantic analysis. Journal of the Am. Soc. for Information Science, 41(6):391-407, 1990.
|
| |
10
|
|
| |
11
|
G.H. Golub and C.F. van Loan. Matrix Computations. North Oxford Academic, Oxford, UK, 1983.
|
| |
12
|
|
| |
13
|
R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43-56. IEEE Press, 1994.
|
 |
14
|
|
| |
15
|
W.B. Johnson and J. Lindenstranss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189-206. Amer. Math. Soc., 1984.
|
| |
16
|
S. Kaski. Data exploration using self-organizing maps. In Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series, number 82. 1997. Dr.Tech. thesis, Helsinki University of Technology, Finland.
|
| |
17
|
S. Kaski. Dimensionality reduction by random mapping. In Proc. Int. Joint Conf. on Neural Networks, volume 1, pages 413-418, 1998.
|
| |
18
|
|
 |
19
|
|
| |
20
|
M. Kurimo. Indexing audio documents by using latent semantic analysis and SOM. In E. Oja and S. Kaski, editors, Kohonen Maps, pages 363-374. Elsevier, 1999.
|
| |
21
|
|
 |
22
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
L. Sirovich and R. Everson. Management and analysis of large scientific datasets. Int. Journal of Supercomputer Applications, 6(1):50-68, spring 1992.
|
| |
27
|
M. Sonka, V. Hlavac, and R. Boyle. Image processing, analysis, and machine vision. PWS Publishing, 1998.
|
| |
28
|
|
CITED BY 36
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kaushal Sanghai , Ting Su , Jennifer Dy , David Kaeli, A multinomial clustering model for fast simulation of computer architecture designs, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
O. D. Sahin , A. Gulbeden , F. Emekci , D. Agrawal , A. El Abbadi, PRISM: indexing multi-dimensional data in P2P networks using reference vectors, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pankaj K. Agarwal , Sariel Har-Peled , Hai Yu, Embeddings of surfaces, curves, and moving points in euclidean space, Proceedings of the twenty-third annual symposium on Computational geometry, June 06-08, 2007, Gyeongju, South Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|