|
ABSTRACT
One aspect of world knowledge essential to information retrieval is knowing when two words are related. Knowing word relatedness allows a system given a user's query terms to retrieve relevant documents not containing those exact terms. Two words can be said to be related if they appear in the same contexts Document co-occurrence gives a measure of word relatedness that has proved to be too rough to be useful. The relatively recent apparition of on-line dictionaries and robust and rapid parsers permits the extraction of finer word contexts from large corpora. In this paper, we will describe such an extraction technique that uses only coarse syntactic analysis and no domain knowledge. This technique produces lists of words related to any work appearing in a corpus. When the closest related terms were used in query expansion of a standard information retrieval testbed, the results were much better than that given by document co-occurence techniques, and slightly better than using unexpanded queries, supporting the contention that semantically similar words were indeed extracted by this technique.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
Crouch 1990
|
|
| |
DeRose 1988
|
|
| |
Dumais 1990
|
Susan T. Dumais. Enhancing performance in latent semantic (LSI) retrieval. Unpublished manuscript, 1990.
|
| |
Evans et al. 1991a
|
David A. Evans, K. Ginther-Webster, Mary Hart, R. G. Lefferts, and Ira A. Monarch. Automatic indexing using selective NLP and first-order thesauri. In RIAO'91, pages 624- 643, Barcelona, April 2-5 1991. CID, Paris.
|
| |
Evans et al. 1991b
|
David A. Evans, Steve K. Henderson, Robert G. Lefferts, and Ira A. Monarch. A summary of the CLARIT project. Technical Report CMU-LCL-91-2, Laboratory for Computational Linguistics, Carnegie-Mellon University, November 1991.
|
| |
Grefenstette 1992
|
G. Grefenstette. Sextant: Extracting semantics from raw text implementation details. Technical Report CS92-05, University of Pittsburgh, Computer Science Dept., February 1992.
|
| |
Hearst 1992
|
|
| |
Hindle 1989
|
|
 |
Lewis and Croft 1990
|
|
| |
Miller et al. 1990
|
George A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. j. Miller. Introduction to WordNet: An on-line lexical database. Journal of Lexicography, 3(4)" 235- 244, 1990.
|
| |
Minker et al. 1972
|
J. Minker, G. A. Wilson, and B. H. Zimmerman. Query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval, 8:329-348, 1972.
|
| |
Peat and Willet 1991
|
Helen J. Peat and Peter Willet. The limitations of term cooccurrence data for query expansion in doc- - ument retrieval systems. Journal of the American Society for information Science, 42(5)'378-383, 1991.
|
| |
Phillips 1985
|
Martin Phillips. Aspects of Text Structure: An investigation of the lexical organization of text. Elsevier, Amsterdam, 1985.
|
| |
Romesburg 1984
|
H. C. Romesburg. Cluster Analysis for Researchers. Lifetime Learning Publications, Belmont, CA, 1984.
|
| |
Ruge 1991
|
Gerda Rage. Experiments on linguistically based term associations. In RIA 0'91, pages 528-545, Barcelona, April 2- 5 1991. CID, Paris.
|
| |
Salton 1971
|
|
| |
Salton 1972
|
G. Salton. Comment on "query expansion by the addition of clustered terms for a document retrieval system". Information Storage and Retrieval, 8:349, 1972.
|
| |
Smeaton and van Rijsbergen 1983
|
A. F. Smeaton and C. J. van Rijsbergen. The retrieval effectiveness of query expansion on a feedback document retrieval system. Computer Journal, 26:239-246, 1983.
|
| |
Sparck Jones 1971
|
Karen Sparck Jones. Automatic Keyword Classification and Information Retrieval. Butterworths, London, 1971.
|
 |
Sparck Jones 1991
|
|
| |
Tanimoto 1958
|
T. T. Tanimoto. An elementary mathematical theory of classification. I.B.M. Research, 1958.
|
| |
Yu and Raghavan 1977
|
C. T. Yu and V. V. Raghavan. Single-pass method for determining the semantic relationships between terms. JASIS, 26(11):345-354, 1977.
|
CITED BY 27
|
|
|
|
|
Tomek Strzalkowski , Jose Perez-Carballo , Mihnea Marinescu, Natural language information retrieval in digital libraries, Proceedings of the first ACM international conference on Digital libraries, p.117-125, March 20-23, 1996, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
Rila Mandala , Takenobu Tokunaga , Hozumi Tanaka, Combining multiple evidence from different types of thesaurus for query expansion, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.191-197, August 15-19, 1999, Berkeley, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hiroyuki Kaji , Yasutsugu Morimoto , Toshiko Aizono , Noriyuki Yamasaki, Corpus-dependent association thesauri for information retrieval, Proceedings of the 18th conference on Computational linguistics, p.404-410, July 31-August 04, 2000, Saarbrücken, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaohua Zhou , Xiaohua Hu , Xiaodan Zhang , Xia Lin , Il-Yeol Song, Context-sensitive semantic smoothing for the language modeling approach to genomic IR, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|