| Document centered approach to text normalization |
| Full text |
Pdf
(867 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 136 - 143
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Author
|
|
Andrei Mikheev
|
LTG, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 48, Citation Count: 8
|
|
|
ABSTRACT
In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification of abbreviations. The main feature of our approach is that it uses a minimum of pre-built resources, instead dynamically inferring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each individual document and portable to other languages. We thoroughly evaluated this approach on several corpora and it showed high accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
John Aberdeen , John Burger , David Day , Lynette Hirschman , Patricia Robinson , Marc Vilain, MITRE: description of the Alembic system used for MUC-6, Proceedings of the 6th conference on Message understanding, November 06-08, 1995, Columbia, Maryland
[doi> 10.3115/1072399.1072413]
|
| |
2
|
B. Baldwin, C. Doran, J. Reynar, M. Niv, B. Srinivas and M. Wasson. Eagle: An extensible architecture for general linguistic engineering. In Proceedings of RIAO '97, Montreal, June 1997.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
K. Seymore, S. Chen and R. Rosenfeld. Nonlinear interpolation of topic models for language model adaptation. In Proceedings of ICSLP98, 1998.
|
CITED BY 8
|
|
|
|
|
|
|
|
Dragomir Radev , Weiguo Fan , Hong Qi , Harris Wu , Amardeep Grewal, Probabilistic question answering on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|