| Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus |
| Full text |
Publisher Site
|
| Source
|
Neural Processing Letters
archive
Volume 15 , Issue 1 (February 2002)
table of contents
Pages: 31 - 43
Year of Publication: 2002
ISSN:1370-4621
|
|
Authors
|
|
A. Kabán
|
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi
|
|
M. A. Girolami
|
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi
|
|
| Publisher |
Kluwer Academic Publishers
Hingham, MA, USA
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 0
|
|
|
ABSTRACT
This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
1. Berry, M. W: Large-scale sparse singular value computations, The International Journal of Super-computer Applications6(1) (1992), 13-49.
|
| |
2
|
2. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci.41(6) (1990), 391-407.
|
| |
3
|
3. Kolenda, T., Hansen, L.-L. and Sigurdsson, S.: Independent components in text, In: M. Girolami (ed.), Advances in Independent Component Analysis (Springer-Verlag, 2000) pp. 241-262.
|
| |
4
|
4. Hofmann, T.: Probabilistic latent semantic analysis, Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI'99), San Francisco, CA, 1999, pp. 289-296.
|
| |
5
|
|
| |
6
|
|
| |
7
|
7. Kabán, A. and Girolami, M.: Unsupervised topic separation and keyword identification in document collections: A projection approach, Technical Report, 10, University of Paisley.
|
| |
8
|
8. Lee, D., Seung, S.: Learning the parts of objects by non-negative matrix factorization, Nature401 (1999), 788-791.
|
 |
9
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
10
|
|
|