ACM Home Page
Please provide us with feedback. Feedback
Using latent semantic analysis to improve access to textual information
Full text PdfPdf (621 KB)
Source Conference on Human Factors in Computing Systems archive
Proceedings of the SIGCHI conference on Human factors in computing systems table of contents
Washington, D.C., United States
Pages: 281 - 285  
Year of Publication: 1988
ISBN:0-201-14237-6
Authors
S. T. Dumais  Bell Communications Research
G. W. Furnas  Bell Communications Research
T. K. Landauer  Bell Communications Research
S. Deerwester  Univ. of Chicago, Chicago, IL
R. Harshman  Univ. of Western Ontario
Sponsor
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 122,   Citation Count: 52
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/57167.57214
What is a DOI?

ABSTRACT

This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
Deerwester, S., Dumais, S.T., Fumas, G.W., Landauer, T.K., and Harshman, R.A. Indexing by latent semantic analysis. Journal of the American Society for Information Science, in press.
 
5
Fumas, G.W., Landauer, T.K., Gomez, UM., and Dumais, S.T. Statistical semanUcs: Analysis of the potential performance of key-word information systems. Bell System Technical Journal, 1983, 62(6), 1753-1806.
6
 
7
8
9
10
11
 
12
 
13
Sparck Jones, K. Automatic keyword classification for information retrieval. Buttersworth, 1971.
 
14
Streeter, L.A. and Lochbaum, K.E. An expert exert-locating system based on automatic representation of semantic structure. In Proceedings of lEEE Conference on AI Applications. San Diego, CA, March 1988.
15
 
16
Weyer, S. The design of a dynamic book for information search. International Journal of Man Machine Studies, 1982,17, 87-107.

CITED BY  52

Collaborative Colleagues:
S. T. Dumais: colleagues
G. W. Furnas: colleagues
T. K. Landauer: colleagues
S. Deerwester: colleagues
R. Harshman: colleagues