ACM Home Page
Please provide us with feedback. Feedback
Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus
Full text Publisher SitePublisher Site
Source Neural Processing Letters archive
Volume 15 ,  Issue 1  (February 2002) table of contents
Pages: 31 - 43  
Year of Publication: 2002
ISSN:1370-4621
Authors
A. Kabán  Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi
M. A. Girolami  Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland. E-mail: ata@james.hut.fi
Publisher
Kluwer Academic Publishers  Hingham, MA, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1023/A:1013801028884

ABSTRACT

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
1. Berry, M. W: Large-scale sparse singular value computations, The International Journal of Super-computer Applications6(1) (1992), 13-49.
 
2
2. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci.41(6) (1990), 391-407.
 
3
3. Kolenda, T., Hansen, L.-L. and Sigurdsson, S.: Independent components in text, In: M. Girolami (ed.), Advances in Independent Component Analysis (Springer-Verlag, 2000) pp. 241-262.
 
4
4. Hofmann, T.: Probabilistic latent semantic analysis, Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI'99), San Francisco, CA, 1999, pp. 289-296.
 
5
 
6
 
7
7. Kabán, A. and Girolami, M.: Unsupervised topic separation and keyword identification in document collections: A projection approach, Technical Report, 10, University of Paisley.
 
8
8. Lee, D., Seung, S.: Learning the parts of objects by non-negative matrix factorization, Nature401 (1999), 788-791.
9
 
10

Collaborative Colleagues:
A. Kabán: colleagues
M. A. Girolami: colleagues