ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Estimating upper and lower bounds on the performance of word-sense disambiguation programs
Full text Publisher SitePublisher Site PdfPdf (864 KB)
Source Annual Meeting of the ACL archive
Proceedings of the 30th annual meeting on Association for Computational Linguistics table of contents
Newark, Delaware
Pages: 249 - 256  
Year of Publication: 1992
Authors
William Gale  AT& T Bell Laboratories, Murray Hill, NJ
Kenneth Ward Church  AT& T Bell Laboratories, Murray Hill, NJ
David Yarowsky  AT& T Bell Laboratories, Murray Hill, NJ
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 21,   Citation Count: 37
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/981967.981999

ABSTRACT

We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). After using both the monolingual and bilingual classifiers for a few months, we have convinced ourselves that the performance is remarkably good. Nevertheless, we would really like to be able to make a stronger statement, and therefore, we decided to try to develop some more objective evaluation measures. Although there has been a fair amount of literature on sense-disambiguation, the literature does not offer much guidance in how we might establish the success or failure of a proposed solution such as the two systems mentioned in the previous paragraph. Many papers avoid quantitative evaluations altogether, because it is so difficult to come up with credible estimates of performance.This paper will attempt to establish upper and lower bounds on the level of performance that can be expected in an evaluation. An estimate of the lower bound of 75% (averaged over ambiguous types) is obtained by measuring the performance produced by a baseline system that ignores context and simply assigns the most likely sense in all cases. An estimate of the upper bound is obtained by assuming that our ability to measure performance is largely limited by our ability obtain reliable judgments from human informants. Not surprisingly, the upper bound is very dependent on the instructions given to the judges. Jorgensen, for example, suspected that lexicographers tend to depend too much on judgments by a single informant and found considerable variation over judgments (only 68% agreement), as she had suspected. In our own experiments, we have set out to find word-sense disambiguation tasks where the judges can agree often enough so that we could show that they were outperforming the baseline system. Under quite different conditions, we have found 96.8% agreement over judges.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bar-Hillel (1960), "Automatic Translation of Languages," in Advances in Computers, Donald Booth and R. E. Meagher, eds., Academic, NY.
 
2
 
3
 
4
Chapman, Robert (1977). Roget's International Thesaurus (Fourth Edition), Harper and Row, NY.
 
5
Choueka, Yaacov, and Serge Lusignan (1985), "Disambiguation by Short Contexts," Computers and the Humanities, v 19. pp. 147--158.
 
6
 
7
Clear, Jeremy (1989). "An Experiment in Automatic Word Sense Identification," Internal Document, Oxford University Press, Oxford.
 
8
Crowie, Anthony et al. (eds.) (1989), "Oxford Advanced Learner's Dictionary," Fourth Edition, Oxford University Press.
 
9
 
10
Gale, William, Kenneth Church, and David Yarowsky (to appear) "A Method for Disambiguating Word Senses in a Large Corpus," Computers and Humanities.
 
11
 
12
Gove, Philip et al. (eds.) (1975) "Webster's Seventh New Collegiate Dictionary," G. & C. Merriam Company, Springfield, MA.
 
13
Grolier's Inc. (1991) New Grolier's Electronic Encyclopedia.
 
14
Hanks, Patrick (ed.) (1979), Collins English Dictionary, Collins, London and Glasgow.
 
15
Hearst, Marti (1991), "Noun Homograph Disambiguation Using Local Context in Large Text Corpora," Using Corpora, University of Waterloo, Waterloo, Ontario.
 
16
 
17
Jorgensen, Julia (1990) "The Psychological Reality of Word Senses," Journal of Psycholinguistic Research, v. 19, pp 167--190.
 
18
Kaplan, Abraham (1950), "An Experimental Study of Ambiguity in Context," cited in Mechanical Translation, v. 1, nos. 1--3.
 
19
Kelly, Edward, and Phillip Stone (1975), Computer Recognition of English Word Senses, North-Holland, Amsterdam.
20
 
21
Masterson, Margaret (1967), "Mechanical Pidgin Translation," in Machine Translation, Donald Booth, ed., Wiley, 1967.
 
22
Mosteller, Fredrick, and David Wallace (1964) Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts.
 
23
Procter, P., R. Ilson, J. Ayto, et al. (1978), Longman Dictionary of Contemporary English, Longman, Harlow and London.
 
24
 
25
Shipstone, E. (1960) "Some Variables Affecting Pattern Conception," Psychological Monographs, General and Applied, v. 74, pp. 1--41.
 
26
Sinclair, J., Hanks, P., Fox, G., Moon, R., Stock, P. et al. (eds.) (1987) Collins Cobuild English Language Dictionary, Collins, London and Glasgow.
 
27
 
28
Small, S. and C. Rieger (1982), "Parsing and Comprehending with Word Experts (A Theory and its Realization)," in Strategies for Natural Language Processing, W. Lehnert and M. Ringle, eds., Lawrence Erlbaum Associates, Hillsdale, NJ.
 
29
 
30
 
31
Walker, Donald (1987), "Knowledge Resource Tools for Accessing Large Text Files," in Machine Translation: Theoretical and Methodological Issues, Sergei Nirenberg, ed., Cambridge University Press, Cambridge, England.
 
32
Weiss, Stephen (1973), "Learning to Disambiguate," Information Storage and Retrieval, v. 9, pp 33--41.
 
33
 
34
Yngve, Victor (1955), "Syntax and the Problem of Multiple Meaning," in Machine Translation of Languages, William Locke and Donald Booth, eds., Wiley, NY.
 
35
Zernik, Uri (1990) "Tagging Word Senses in Corpus: The Needle in the Haystack Revisited," in Text-Based Intelligent Systems: Current Research in Text Analysis, Information Extraction, and Retrieval, P. S. Jacobs, ed., GE Research & Development Center, Schenectady, NY.
 
36
Zernik, Uri (1991) "Train1 vs. Train2: Tagging Word Senses in Corpus," in Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Lawrence Erlbaum, Hillsdale, NJ.

CITED BY  37
Collaborative Colleagues:
William Gale: colleagues
Kenneth Ward Church: colleagues
David Yarowsky: colleagues