|
ABSTRACT
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is extremely sensitive to the number of dimensions. This makes it difficult to estimate the intrinsic dimensionality, while inaccurately estimated dimensionality would drastically degrade its performance. One reason leading to this problem is that LPI is non-orthogonal. Non-orthogonality distorts the metric structure of the document space. In this paper, we propose a new algorithm called Orthogonal LPI. Orthogonal LPI iteratively computes the mutually orthogonal basis functions which respect the local geometrical structure. Moreover, our empirical study shows that OLPI can have more locality preserving power than LPI. We compare the new algorithm to LSI and LPI. Extensive experimental results show that Orthogonal LPI obtains better performance than both LSI and LPI. More crucially, it is insensitive to the number of dimensions, which makes it an efficient data preprocessing method for text clustering, classification, retrieval, etc.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, 2001.
|
| |
5
|
M. Belkin, P. Niyogi, and V. Sindhwani. On maniold regularization. Technical report tr-2004-05, Computer Science Department, The University of Chicago, 2004.
|
| |
6
|
F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. 1997.
|
| |
7
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
 |
8
|
|
| |
9
|
|
| |
10
|
G. H. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, 1996.
|
 |
11
|
|
 |
12
|
|
| |
13
|
B. Kegl. Intrinsic dimension estimation using packing numbers. In Advances in Neural Information Processing Systems 15, 2002.
|
 |
14
|
|
| |
15
|
L. Lovasz and M. Plummer. Matching Theory. Akadémiai Kiadó, North Holland, Budapest, 1986.
|
| |
16
|
P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Technical report tr-2004-08, Department of Computer Science, University of Chicago, 2004.
|
 |
17
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
18
|
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.
|
 |
19
|
|
| |
20
|
J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.
|
| |
21
|
U. von Luxburg, O. Bousquet, and M. Belkin. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17, 2004.
|
 |
22
|
|
CITED BY 11
|
|
|
|
|
|
|
|
Hao Cheng , Kien A Hua , Khanh Vu , Danzhou Liu, Semi-supervised dimensionality reduction in image feature space, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zi Huang , Heng Tao Shen , Jie Shao , Stefan Rüger , Xiaofang Zhou, Locality condensation: a new dimensionality reduction method for image retrieval, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|