|
ABSTRACT
It is well-known that there are polysemous words like sentence whose "meaning" or "sense" depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). As this work was nearing completion, we observed a very strong discourse effect. That is, if a polysemous word such as sentence appears two or more times in a well-written discourse, it is extremely likely that they will all share the same sense. This paper describes an experiment which confirmed this hypothesis and found that the tendency to share sense in the same discourse is extremely strong (98%). This result can be used as an additional source of constraint for improving the performance of the word-sense disambiguation algorithm. In addition, it could also be used to help evaluate disambiguation algorithms that did not make use of the discourse constraint.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , Robert L. Mercer, Word-sense disambiguation using statistical methods, Proceedings of the 29th annual meeting on Association for Computational Linguistics, p.264-270, June 18-21, 1991, Berkeley, California
[doi> 10.3115/981344.981378]
|
| |
3
|
Chapman, Robert (1977). Roget's International Thesaurus (Fourth Edition), Harper and Row, New York.
|
| |
4
|
|
| |
5
|
Gale, Church, and Yarowsky, 1992, "Discrimination Decisions for 100,000-Dimensional Spaces" AT&T Statistical Research Report No. 103.
|
| |
6
|
Grolier's Inc. (1991) New Grolier's Electronic Encyclopedia.
|
| |
7
|
|
| |
8
|
Kelly, Edward, and Phillip Stone (1975), Computer Recognition of English Word Senses, North-Holland, Amsterdam.
|
| |
9
|
Mosteller, Fredrick, and David Wallace (1964) Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts.
|
| |
10
|
|
| |
11
|
|
CITED BY 63
|
|
|
|
|
|
|
Rob Koeling , Diana McCarthy , John Carroll, Domain-specific sense distributions and predominant sense acquisition, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.419-426, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
Rada Mihalcea , Paul Tarau , Elizabeth Figa, PageRank on semantic networks, with application to word sense disambiguation, Proceedings of the 20th international conference on Computational Linguistics, p.1126-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
|
|
|
Huifeng Li , Rohini K. Srihari , Cheng Niu , Wei Li, Location normalization for information extraction, Proceedings of the 19th international conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ryo Nagata , Koichiro Morihiro , Atsuo Kawai , Naoki Isu, Reinforcing English countability prediction with one countability per discourse property, Proceedings of the COLING/ACL on Main conference poster sessions, p.595-602, July 17-18, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cheng Niu , Wei Li , Jihong Ding , Rohini K. Srihari, Bootstrapping for named entity tagging using concept-based seeds, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers, p.73-75, May 27-June 01, 2003, Edmonton, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cheng Niu , Wei Li , Jihong Ding , Rohini K. Srihari, A bootstrapping approach to named entity classification using successive learners, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, p.335-342, July 07-12, 2003, Sapporo, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Georgios Petasis , Alessandro Cucchiarelli , Paola Velardi , Georgios Paliouras , Vangelis Karkaletsis , Constantine D. Spyropoulos, Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.128-135, July 24-28, 2000, Athens, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Mukherjea , L. V. Subramaniam , G. Chanda , S. Sankararaman , R. Kothari , V. Batra , D. Bhardwaj , B. Srivastava, Enhancing a biomedical information extraction system with dictionary mining and context disambiguation, IBM Journal of Research and Development, v.48 n.5/6, p.693-701, September/November 2004
|
|
|
|
|
Kobus Barnard , Quanfu Fan , Ranjini Swaminathan , Anthony Hoogs , Roderic Collins , Pascale Rondot , John Kaufhold, Evaluation of Localized Semantics: Data, Methodology, and Experiments, International Journal of Computer Vision, v.77 n.1-3, p.199-217, May 2008
|
|
|
|
|
|
|
|
|
|