|
ABSTRACT
We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). After using both the monolingual and bilingual classifiers for a few months, we have convinced ourselves that the performance is remarkably good. Nevertheless, we would really like to be able to make a stronger statement, and therefore, we decided to try to develop some more objective evaluation measures. Although there has been a fair amount of literature on sense-disambiguation, the literature does not offer much guidance in how we might establish the success or failure of a proposed solution such as the two systems mentioned in the previous paragraph. Many papers avoid quantitative evaluations altogether, because it is so difficult to come up with credible estimates of performance.This paper will attempt to establish upper and lower bounds on the level of performance that can be expected in an evaluation. An estimate of the lower bound of 75% (averaged over ambiguous types) is obtained by measuring the performance produced by a baseline system that ignores context and simply assigns the most likely sense in all cases. An estimate of the upper bound is obtained by assuming that our ability to measure performance is largely limited by our ability obtain reliable judgments from human informants. Not surprisingly, the upper bound is very dependent on the instructions given to the judges. Jorgensen, for example, suspected that lexicographers tend to depend too much on judgments by a single informant and found considerable variation over judgments (only 68% agreement), as she had suspected. In our own experiments, we have set out to find word-sense disambiguation tasks where the judges can agree often enough so that we could show that they were outperforming the baseline system. Under quite different conditions, we have found 96.8% agreement over judges.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bar-Hillel (1960), "Automatic Translation of Languages," in Advances in Computers, Donald Booth and R. E. Meagher, eds., Academic, NY.
|
| |
2
|
|
| |
3
|
Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , Robert L. Mercer, Word-sense disambiguation using statistical methods, Proceedings of the 29th annual meeting on Association for Computational Linguistics, p.264-270, June 18-21, 1991, Berkeley, California
[doi> 10.3115/981344.981378]
|
| |
4
|
Chapman, Robert (1977). Roget's International Thesaurus (Fourth Edition), Harper and Row, NY.
|
| |
5
|
Choueka, Yaacov, and Serge Lusignan (1985), "Disambiguation by Short Contexts," Computers and the Humanities, v 19. pp. 147--158.
|
| |
6
|
|
| |
7
|
Clear, Jeremy (1989). "An Experiment in Automatic Word Sense Identification," Internal Document, Oxford University Press, Oxford.
|
| |
8
|
Crowie, Anthony et al. (eds.) (1989), "Oxford Advanced Learner's Dictionary," Fourth Edition, Oxford University Press.
|
| |
9
|
|
| |
10
|
Gale, William, Kenneth Church, and David Yarowsky (to appear) "A Method for Disambiguating Word Senses in a Large Corpus," Computers and Humanities.
|
| |
11
|
|
| |
12
|
Gove, Philip et al. (eds.) (1975) "Webster's Seventh New Collegiate Dictionary," G. & C. Merriam Company, Springfield, MA.
|
| |
13
|
Grolier's Inc. (1991) New Grolier's Electronic Encyclopedia.
|
| |
14
|
Hanks, Patrick (ed.) (1979), Collins English Dictionary, Collins, London and Glasgow.
|
| |
15
|
Hearst, Marti (1991), "Noun Homograph Disambiguation Using Local Context in Large Text Corpora," Using Corpora, University of Waterloo, Waterloo, Ontario.
|
| |
16
|
|
| |
17
|
Jorgensen, Julia (1990) "The Psychological Reality of Word Senses," Journal of Psycholinguistic Research, v. 19, pp 167--190.
|
| |
18
|
Kaplan, Abraham (1950), "An Experimental Study of Ambiguity in Context," cited in Mechanical Translation, v. 1, nos. 1--3.
|
| |
19
|
Kelly, Edward, and Phillip Stone (1975), Computer Recognition of English Word Senses, North-Holland, Amsterdam.
|
 |
20
|
|
| |
21
|
Masterson, Margaret (1967), "Mechanical Pidgin Translation," in Machine Translation, Donald Booth, ed., Wiley, 1967.
|
| |
22
|
Mosteller, Fredrick, and David Wallace (1964) Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts.
|
| |
23
|
Procter, P., R. Ilson, J. Ayto, et al. (1978), Longman Dictionary of Contemporary English, Longman, Harlow and London.
|
| |
24
|
|
| |
25
|
Shipstone, E. (1960) "Some Variables Affecting Pattern Conception," Psychological Monographs, General and Applied, v. 74, pp. 1--41.
|
| |
26
|
Sinclair, J., Hanks, P., Fox, G., Moon, R., Stock, P. et al. (eds.) (1987) Collins Cobuild English Language Dictionary, Collins, London and Glasgow.
|
| |
27
|
|
| |
28
|
Small, S. and C. Rieger (1982), "Parsing and Comprehending with Word Experts (A Theory and its Realization)," in Strategies for Natural Language Processing, W. Lehnert and M. Ringle, eds., Lawrence Erlbaum Associates, Hillsdale, NJ.
|
| |
29
|
|
| |
30
|
|
| |
31
|
Walker, Donald (1987), "Knowledge Resource Tools for Accessing Large Text Files," in Machine Translation: Theoretical and Methodological Issues, Sergei Nirenberg, ed., Cambridge University Press, Cambridge, England.
|
| |
32
|
Weiss, Stephen (1973), "Learning to Disambiguate," Information Storage and Retrieval, v. 9, pp 33--41.
|
| |
33
|
|
| |
34
|
Yngve, Victor (1955), "Syntax and the Problem of Multiple Meaning," in Machine Translation of Languages, William Locke and Donald Booth, eds., Wiley, NY.
|
| |
35
|
Zernik, Uri (1990) "Tagging Word Senses in Corpus: The Needle in the Haystack Revisited," in Text-Based Intelligent Systems: Current Research in Text Analysis, Information Extraction, and Retrieval, P. S. Jacobs, ed., GE Research & Development Center, Schenectady, NY.
|
| |
36
|
Zernik, Uri (1991) "Train1 vs. Train2: Tagging Word Senses in Corpus," in Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Lawrence Erlbaum, Hillsdale, NJ.
|
CITED BY 37
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mochizuki Hajime , Honda Takeo , Okumura Manabu, Text segmentation with multiple surface linguistic cues, Proceedings of the 36th annual meeting on Association for Computational Linguistics, p.881-885, August 10-14, 1998, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marilyn A. Walker , Diane J. Litman , Candace A. Kamm , Alicia Abella, PARADISE: a framework for evaluating spoken dialogue agents, Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, p.271-280, July 07-12, 1997, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
George A. Miller , Martin Chodorow , Shari Landes , Claudia Leacock , Robert G. Thomas, Using a semantic concordance for sense identification, Proceedings of the workshop on Human Language Technology, March 08-11, 1994, Plainsboro, NJ
|
|
|
|
|
|
|
|
|
Dan TufiŞ , Radu Ion , Nancy Ide, Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets, Proceedings of the 20th international conference on Computational Linguistics, p.1312-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
Zhimao Lu , Haifeng Wang , Jianmin Yao , Ting Liu , Sheng Li, An equivalent pseudoword solution to Chinese word sense disambiguation, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, p.457-464, July 17-18, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|