ACM Home Page
Please provide us with feedback. Feedback
Document centered approach to text normalization
Full text PdfPdf (867 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 136 - 143  
Year of Publication: 2000
ISBN:1-58113-226-3
Author
Andrei Mikheev  LTG, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 48,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345564
What is a DOI?

ABSTRACT

In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification of abbreviations. The main feature of our approach is that it uses a minimum of pre-built resources, instead dynamically inferring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each individual document and portable to other languages. We thoroughly evaluated this approach on several corpora and it showed high accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
B. Baldwin, C. Doran, J. Reynar, M. Niv, B. Srinivas and M. Wasson. Eagle: An extensible architecture for general linguistic engineering. In Proceedings of RIAO '97, Montreal, June 1997.
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
K. Seymore, S. Chen and R. Rosenfeld. Nonlinear interpolation of topic models for language model adaptation. In Proceedings of ICSLP98, 1998.

CITED BY  8