ACM Home Page
Please provide us with feedback. Feedback
Experiments in automatic statistical thesaurus construction
Full text PdfPdf (741 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Copenhagen, Denmark
Pages: 77 - 88  
Year of Publication: 1992
ISBN:0-89791-523-2
Authors
Carolyn J. Crouch  Department of Computer Science, University of Minnesota, Duluth, Duluth, MN
Bokyung Yang  West Publishing Company, Eagan, Minnesota and Department of Computer Science, University of Minnesota, Duluth, Duluth, MN
Sponsors
Royal School of Lib. : Royal School of Lib.
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 90,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/133160.133180
What is a DOI?

ABSTRACT

A well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2] based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a term's membership in a particular thesaurus class, is found not to be useful in distinguishing a “good” from an “indifferent” or “poor” thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Cleverdon, C.W.; Mills, J.; Keen, M. Factors determining the performance of indexing systems. Aslib Cranfield Project, Vol. 1; 1966.
 
4
Salton, G. Scientific reports on information storage and retrieval (ISR 11). Department of Computer Science, Cornell University, 1966.
 
5
Salton, G. Scientific reports on information storage and retrieval (ISR 13). Department of Computer Science, Cornell University, 1968.
 
6
 
7
Sparck Jones, K. Automatic keyword classification for information retrieval. London: Butterworths; 1971.
8
 
9
 
10
 
11
 
12
Chen, H.; Lynch, K. Semantics-based information management and retrieval: A knowledge discovery approach. IEEE Transactions on Systems, Man and Cybernetics, 1992.
 
13
 
14
Salton, G.; Yang, C.S.; Yu, C. T. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33-44; 1975.15.
 
15
 
16
 
17
 
18
 
19
 
20
 
21
Fox, E. Characteristics of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. Report 83-561, Department of Computer Science, Comell University.
 
22

CITED BY  21

Collaborative Colleagues:
Carolyn J. Crouch: colleagues
Bokyung Yang: colleagues