| Using LSI for text classification in the presence of background text |
| Full text |
Pdf
(1.01 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the tenth international conference on Information and knowledge management
table of contents
Atlanta, Georgia, USA
Session: Text Extraction and Summarization
table of contents
Pages: 113 - 118
Year of Publication: 2001
ISBN:1-58113-436-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 75, Citation Count: 14
|
|
|
ABSTRACT
This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Bennet and A. Demiriz. Semi-supervised support vector machines. Advances in Neural Information Processing Systems, 12:368-374,1998.
|
| |
2
|
|
 |
3
|
|
| |
4
|
W. Cohen and H. Hirsh. Joins that generalize: Text categorization using WHIRL. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pages 169-173,1998.
|
| |
5
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
6
|
S. Deerwester, S. Dumais, G. Fumas, and T. Landauer. Indexing by latent semantic analysis. Journalfor the American Societyfor Information Science, 41(6):39 1407, 1990.
|
| |
7
|
S. Dumais. LSI meets TREC: A status report. In D. Hartman, editor, Thejirst Text REtrieval Conference: NIST special publication 500-215, pages 105-l 16,1993.
|
| |
8
|
S. Dumais. Latent semantic indexing (LSI): TREC-3 report. In D. Hartman, editor, The Third Text REtrieval Conference, NISTspecialpublication 500-225, pages 219-230,1995.
|
| |
9
|
S. Dumais. Combining evidence for effective information filtering. In AAAI Spring Symposium on Machine Learning and Information Retrieval, Tech Report SS-96-07, 1996.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137,198O.
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
CITED BY 15
|
|
|
|
|
|
|
|
Ning Liu , Benyu Zhang , Jun Yan , Qiang Yang , Shuicheng Yan , Zheng Chen , Fengshan Bai , Wei-Ying Ma, Learning similarity measures in non-orthogonal space, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
David Patterson , Niall Rooney , Mykola Galushka , Vladimir Dobrynin , Elena Smirnova, SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning, Knowledge-Based Systems, v.21 n.5, p.404-414, July, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sutanu Chakraborti , Rahman Mukras , Robert Lothian , Nirmalie Wiratunga , Stuart Watt , David Harper, Supervised latent semantic indexing using adaptive sprinkling, Proceedings of the 20th international joint conference on Artifical intelligence, p.1582-1587, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|