ACM Home Page
Please provide us with feedback. Feedback
Unsupervised estimation for noisy-channel models
Full text PdfPdf (272 KB)
Source ICML; Vol. 227 archive
Proceedings of the 24th international conference on Machine learning table of contents
Corvalis, Oregon
Pages: 665 - 672  
Year of Publication: 2007
ISBN:978-1-59593-793-3
Authors
Markos Mylonakis  University of Amsterdam, Amsterdam, Netherlands
Khalil Sima'an  University of Amsterdam, Amsterdam, Netherlands
Rebecca Hwa  University of Pittsburgh, Pittsburgh, PA
Sponsor
: Machine Learning Journal
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 22,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273496.1273580
What is a DOI?

ABSTRACT

Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize the original message and a channel model to describe the channel's corruptive process. The standard approach for estimating the parameters of the channel model is unsupervised Maximum-Likelihood of the observation data, usually approximated using the Expectation-Maximization (EM) algorithm. In this paper we show that it is better to maximize the joint likelihood of the data at both ends of the noisy-channel. We derive a corresponding bi-directional EM algorithm and show that it gives better performance than standard EM on two tasks: (1) translation using a probabilistic lexicon and (2) adaptation of a part-of-speech tagger between related languages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Baum, L., Peterie, T., Souled, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Statist., 41, 164--171.
 
3
 
4
 
5
Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the cmu-cambridge toolkit. Proceedings ESCA Eurospeech.
 
6
Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). Mbt: A memory-based part of speech tagger generator. Proceedings of the fourth Workshop on Very Large Corpora (ACL SIGDAT) (pp. 14--27). Copenhagen, Denmark.
 
7
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39, 1--38.
 
8
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. MT Summit.
 
9
 
10
 
11
 
12
 
13
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The penn arabic treebank: Building a large-scale annotated arabic corpus. Proceedings of NEMLAR 2004..
 
14
 
15
 
16
Rambow, O., Chiang, D., Diab, M., Habash, N., Hwa, R., Sima'an, K., Lacey, V., Levy, R., Nichols, C., & Shareef, S. (2005). Parsing arabic dialects (Technical Report). Johns Hopkins University 2005 Summer Workshop on Language Engineering.
 
17
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proceedings of the Conference on Empirical Methods in Natural Language Processing.
 
18
 
19
Collaborative Colleagues:
Markos Mylonakis: colleagues
Khalil Sima'an: colleagues
Rebecca Hwa: colleagues