|
ABSTRACT
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI might not be optimal in discriminating documents with different semantics. In this paper, a novel algorithm called Locality Preserving Indexing (LPI) is proposed for document indexing. Each document is represented by a vector with low dimensionality. In contrast to LSI which discovers the global structure of the document space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare the proposed LPI approach with LSI on two standard databases. Experimental results show that LPI provides better representation in the sense of semantic structure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering", Advances in Neural Information Processing Systems 14, Vancouver, Canada, 2001.
|
 |
5
|
|
| |
6
|
Fan R. K. Chung, Spectral Graph Theory, Regional Conferences Series in Mathematics, number 92, 1997.
|
| |
7
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman, "Indexing by Latent Semantic Analysis", Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
8
|
L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag New York, Inc., 1996.
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
Xiaofei He and Partha Niyogi, "Locality Preserving Projections", in Advances in Neural Information Processing Systems 16, Vancouver, Canada, 2003.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
K. Lang, "Learning to filter netnews", Proc. Of the 12th Int. Conf. on Machine Learning, 1995.
|
 |
18
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
19
|
S. T. Roweis, L. K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding", Science, vol 290, 22 December 2000.
|
| |
20
|
|
| |
21
|
J. B. Tenenbaum, Vin De Silva, and J. C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction", Science, Vol 290, 22 December 2000.
|
 |
22
|
|
CITED BY 16
|
|
|
|
|
|
|
|
Xin Zheng , Deng Cai , Xiaofei He , Wei-Ying Ma , Xueyin Lin, Locality preserving clustering for image database, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
Zi Huang , Xiaofang Zhou , Dawei Song , Peter Bruza, Dimensionality reduction in patch-signature based protein structure matching, Proceedings of the 17th Australasian Database Conference, p.89-97, January 16-19, 2006, Hobart, Australia
|
|
|
|
|
|
Jing Liu , Mingjing Li , Wei-Ying Ma , Qingshan Liu , Hanqing Lu, An adaptive graph model for automatic image annotation, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
|
|
|
|
|
|
|
|
|
Zi Huang , Hengtao Shen , Xiaofang Zhou , Dawei Song , Stefan Rüger, Dimensionality reduction for dimension-specific search, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
Deng Cai , Qiaozhu Mei , Jiawei Han , Chengxiang Zhai, Modeling hidden topics on document manifold, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Zi Huang , Heng Tao Shen , Jie Shao , Stefan Rüger , Xiaofang Zhou, Locality condensation: a new dimensionality reduction method for image retrieval, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|