ACM Home Page
Please provide us with feedback. Feedback
Successfully detecting and correcting false friends using channel profiles
Full text PdfPdf (133 KB)
Source AND; Vol. 303 archive
Proceedings of the second workshop on Analytics for noisy unstructured text data table of contents
Singapore
Pages 17-22  
Year of Publication: 2008
ISBN:978-1-60558-196-5
Authors
Ulrich Reffle  University of Munich (LMU)
Annette Gotscharek  University of Munich (LMU)
Christoph Ringlstetter  University of Alberta
Klaus U. Schulz  University of Munich (LMU)
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 26,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390749.1390754
What is a DOI?

ABSTRACT

The detection and correction of false friends - also called real-word errors - is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
I. A. Bolshakov and A. F. Gelbukh. On detection of malapropisms by multistage collocation testing. In 8th International Conference on Applications of Natural Language to Information Systems, June 2003, Burg (Spreewald), Germany, pages 28--41, 2003.
 
2
T. Brants and A. Franz. Web 1t 5-gram version 1. Linguistic Data Consortium, Philadelphia, 2006.
 
3
A. Dengel, R. Hoch, F. Hönes, T. Jäger, M. Malburg, and A. Weigel. Techniques for improving OCR results. In H. Bunke and P. S. Wang, editors, Handbook of Character Recognition and Document Image Analysis, pages 227--258. World Scientific, 1997.
 
4
W. A. Gale, K. W. Church, and D. Yarowsky. Discrimination decisions for 100,000-dimensional spaces. Current Issues in Computational Linguistics: In Honour of Don Walker, pages 429--450, 1994.
 
5
C. Giuliano. jWeb1T: a library for searching the Web 1T 5-gram corpus, 2007. Software available at http://tcc.itc.it/research/textec/tools-resources/jweb1t.html.
 
6
A. R. Golding. A bayesian hybrid method for context-sensitive spelling correction. pages 39--53, 1995.
 
7
 
8
 
9
10
11
 
12
 
13
S. Mihov, P. Mitankin, A. Gotscharek, U. Reffle, K. U. Schulz, and C. Ringlstetter. Using automated error profiling of texts for improved selection of correction candidates for garbled tokens. In Australian Conference on Artificial Intelligence (AI2007), volume 4830 of Lecture Notes in Computer Science, pages 456--465, 2007.
 
14
 
15
M. Reynaert. All, and only, the errors: more complete and consistent spelling and ocr-error correction evaluation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2008.
 
16
 
17
 
18
 
19
L. A. Wilcox-O'Hearn, G. Hirst, and A. Budanitsky. Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. In Proc., 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2008), pages 605--616, Haifa, 2008.
 
20


Collaborative Colleagues:
Ulrich Reffle: colleagues
Annette Gotscharek: colleagues
Christoph Ringlstetter: colleagues
Klaus U. Schulz: colleagues