|
ABSTRACT
Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize the original message and a channel model to describe the channel's corruptive process. The standard approach for estimating the parameters of the channel model is unsupervised Maximum-Likelihood of the observation data, usually approximated using the Expectation-Maximization (EM) algorithm. In this paper we show that it is better to maximize the joint likelihood of the data at both ends of the noisy-channel. We derive a corresponding bi-directional EM algorithm and show that it gives better performance than standard EM on two tasks: (1) translation using a probabilistic lexicon and (2) adaptation of a part-of-speech tagger between related languages.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Baum, L., Peterie, T., Souled, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Statist., 41, 164--171.
|
| |
3
|
|
| |
4
|
P. Brown , J. Cocke , S. Della Pietra , V. Della Pietra , F. Jelinek , R. Mercer , P. Roossin, A statistical approach to language translation, Proceedings of the 12th conference on Computational linguistics, p.71-76, August 22-27, 1988, Budapest, Hungry
[doi> 10.3115/991635.991651]
|
| |
5
|
Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the cmu-cambridge toolkit. Proceedings ESCA Eurospeech.
|
| |
6
|
Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). Mbt: A memory-based part of speech tagger generator. Proceedings of the fourth Workshop on Very Large Corpora (ACL SIGDAT) (pp. 14--27). Copenhagen, Denmark.
|
| |
7
|
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39, 1--38.
|
| |
8
|
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. MT Summit.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The penn arabic treebank: Building a large-scale annotated arabic corpus. Proceedings of NEMLAR 2004..
|
| |
14
|
|
| |
15
|
|
| |
16
|
Rambow, O., Chiang, D., Diab, M., Habash, N., Hwa, R., Sima'an, K., Lacey, V., Levy, R., Nichols, C., & Shareef, S. (2005). Parsing arabic dialects (Technical Report). Johns Hopkins University 2005 Summer Workshop on Language Engineering.
|
| |
17
|
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proceedings of the Conference on Empirical Methods in Natural Language Processing.
|
| |
18
|
|
| |
19
|
|
|