ACM Home Page
Please provide us with feedback. Feedback
Using LSI for text classification in the presence of background text
Full text PdfPdf (1.01 MB)
Source Conference on Information and Knowledge Management archive
Proceedings of the tenth international conference on Information and knowledge management table of contents
Atlanta, Georgia, USA
Session: Text Extraction and Summarization table of contents
Pages: 113 - 118  
Year of Publication: 2001
ISBN:1-58113-436-3
Authors
Sarah Zelikovitz  Rutgers University, Piscataway, NJ
Haym Hirsh  Rutgers University, Piscataway, NJ
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 75,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502585.502605
What is a DOI?

ABSTRACT

This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Bennet and A. Demiriz. Semi-supervised support vector machines. Advances in Neural Information Processing Systems, 12:368-374,1998.
 
2
3
 
4
W. Cohen and H. Hirsh. Joins that generalize: Text categorization using WHIRL. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 169-173,1998.
 
5
 
6
S. Deerwester, S. Dumais, G. Fumas, and T. Landauer. Indexing by latent semantic analysis. Journalfor the American Societyfor Information Science, 41(6):39 1407, 1990.
 
7
S. Dumais. LSI meets TREC: A status report. In D. Hartman, editor, Thejirst Text REtrieval Conference: NIST special publication 500-215, pages 105-l 16,1993.
 
8
S. Dumais. Latent semantic indexing (LSI): TREC-3 report. In D. Hartman, editor, The Third Text REtrieval Conference, NISTspecialpublication 500-225, pages 219-230,1995.
 
9
S. Dumais. Combining evidence for effective information filtering. In AAAI Spring Symposium on Machine Learning and Information Retrieval, Tech Report SS-96-07, 1996.
10
 
11
 
12
13
 
14
 
15
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137,198O.
 
16
17
 
18

CITED BY  15

Collaborative Colleagues:
Sarah Zelikovitz: colleagues
Haym Hirsh: colleagues