| Experiments in automatic statistical thesaurus construction |
| Full text |
Pdf
(741 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Copenhagen, Denmark
Pages: 77 - 88
Year of Publication: 1992
ISBN:0-89791-523-2
|
|
Authors
|
|
Carolyn J. Crouch
|
Department of Computer Science, University of Minnesota, Duluth, Duluth, MN
|
|
Bokyung Yang
|
West Publishing Company, Eagan, Minnesota and Department of Computer Science, University of Minnesota, Duluth, Duluth, MN
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 19, Downloads (12 Months): 90, Citation Count: 21
|
|
|
ABSTRACT
A well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2] based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a term's membership in a particular thesaurus class, is found not to be useful in distinguishing a “good” from an “indifferent” or “poor” thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Cleverdon, C.W.; Mills, J.; Keen, M. Factors determining the performance of indexing systems. Aslib Cranfield Project, Vol. 1; 1966.
|
| |
4
|
Salton, G. Scientific reports on information storage and retrieval (ISR 11). Department of Computer Science, Cornell University, 1966.
|
| |
5
|
Salton, G. Scientific reports on information storage and retrieval (ISR 13). Department of Computer Science, Cornell University, 1968.
|
| |
6
|
|
| |
7
|
Sparck Jones, K. Automatic keyword classification for information retrieval. London: Butterworths; 1971.
|
 |
8
|
|
| |
9
|
|
| |
10
|
Edward A. Fox , J. Terry Nutter , Thomas Ahlswede , Martha Evens , Judith Markowitz, Building a large thesaurus for information retrieval, Proceedings of the second conference on Applied natural language processing, February 09-12, 1988, Austin, Texas
[doi> 10.3115/974235.974253]
|
| |
11
|
|
| |
12
|
Chen, H.; Lynch, K. Semantics-based information management and retrieval: A knowledge discovery approach. IEEE Transactions on Systems, Man and Cybernetics, 1992.
|
| |
13
|
|
| |
14
|
Salton, G.; Yang, C.S.; Yu, C. T. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33-44; 1975.15.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Fox, E. Characteristics of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. Report 83-561, Department of Computer Science, Comell University.
|
| |
22
|
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hsinchun Chen , Bruce Schatz , Tobun Ng , Joanne Martinez , Amy Kirchhoff , Chienting Lin, A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.18 n.8, p.771-782, August 1996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|