ACM Home Page
Please provide us with feedback. Feedback
Random projection in dimensionality reduction: applications to image and text data
Full text PdfPdf (592 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 245 - 250  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Ella Bingham  Helsinki University of Technology, FIN-02015 HUT, Finland
Heikki Mannila  Helsinki University of Technology, FIN-02015 HUT, Finland
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 50,   Downloads (12 Months): 320,   Citation Count: 36
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502546
What is a DOI?

ABSTRACT

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
M.-W. Berry. Large-scale sparse singular value computations. International Journal of Super-Computer Applications, 6(1):13-49, 1992.
 
6
 
7
 
8
S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA, 1999.
 
9
S. Deerwester, S.T. Dumais, G.W. Furnas, and T.K. Landauer. Indexing by latent semantic analysis. Journal of the Am. Soc. for Information Science, 41(6):391-407, 1990.
 
10
 
11
G.H. Golub and C.F. van Loan. Matrix Computations. North Oxford Academic, Oxford, UK, 1983.
 
12
 
13
R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43-56. IEEE Press, 1994.
14
 
15
W.B. Johnson and J. Lindenstranss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189-206. Amer. Math. Soc., 1984.
 
16
S. Kaski. Data exploration using self-organizing maps. In Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series, number 82. 1997. Dr.Tech. thesis, Helsinki University of Technology, Finland.
 
17
S. Kaski. Dimensionality reduction by random mapping. In Proc. Int. Joint Conf. on Neural Networks, volume 1, pages 413-418, 1998.
 
18
19
 
20
M. Kurimo. Indexing audio documents by using latent semantic analysis and SOM. In E. Oja and S. Kaski, editors, Kohonen Maps, pages 363-374. Elsevier, 1999.
 
21
22
 
23
 
24
 
25
 
26
L. Sirovich and R. Everson. Management and analysis of large scientific datasets. Int. Journal of Supercomputer Applications, 6(1):50-68, spring 1992.
 
27
M. Sonka, V. Hlavac, and R. Boyle. Image processing, analysis, and machine vision. PWS Publishing, 1998.
 
28

CITED BY  36

Collaborative Colleagues:
Ella Bingham: colleagues
Heikki Mannila: colleagues