|
ABSTRACT
This article presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. An indirect hidden Markov model (IHMM) is proposed to address the synonym matching and word ordering issues in hypothesis alignment. Unlike traditional HMMs whose parameters are trained via maximum likelihood estimation (MLE), the parameters of the IHMM are estimated indirectly from a variety of sources including word semantic similarity, word surface similarity, and a distance-based distortion penalty. The IHMM-based method significantly outperforms the state-of-the-art, TER-based alignment model in our experiments on NIST benchmark datasets. Our combined SMT system using the proposed method achieved the best Chinese-to-English translation result in the constrained training track of the 2008 NIST Open MT Evaluation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Andrew, G. 2006. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Proceedings (EMNLP’06).
|
| |
2
|
Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Conference on Automatic Speech Recognition and Understanding (ASRU’01). 351--354.
|
| |
3
|
Brent, R. 1973. Algorithms for minimization without derivatives. Prentice-Hall, Chapter 7.
|
| |
4
|
|
| |
5
|
Simon Corston-Oliver , Anthony Aue , Kevin Duh , Eric Ringger, Multilingual dependency parsing using Bayes Point Machines, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.160-167, June 04-09, 2006, New York, New York
[doi> 10.3115/1220835.1220856]
|
| |
6
|
Foster, G. and Kuhn, R. 2007. Mixture-model adaptation for SMT. In Proceedings of the 2nd ACL Workshop on Statistical Machine Translation (SML’07). 128--136.
|
| |
7
|
Michel Galley , Jonathan Graehl , Kevin Knight , Daniel Marcu , Steve DeNeefe , Wei Wang , Ignacio Thayer, Scalable inference and training of context-rich syntactic translation models, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, p.961-968, July 17-18, 2006, Sydney, Australia
[doi> 10.3115/1220175.1220296]
|
| |
8
|
|
| |
9
|
He, X. 2007. Using word-dependent transition models in HMM based word alignment for statistical machine translation. In Proceedings of the 2nd ACL Workshop on Statistical Machine Translation (SMT’07).
|
| |
10
|
Huang, L. and Chiang, D. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the Association on Computer Linguistics (ACL’07).
|
| |
11
|
Jayaraman, S. and Lavie, A. 2005. Multi-engine machine translation guided by explicit word matching. In Proceedings of the European Association for Machine Translation (EAMT’05). 143--152.
|
| |
12
|
Karakos, D., Eisner, J., Khudanpur, S., and Dreyer, M. 2008. Machine translation system combination using ITG-based alignments. In Proceedings of the Human Language Technology Conference (ACL-HLT’08). 81--84.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Li, C.-H., Li, M., Zhang, D., Li, M., Zhou, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Association on Computer Linguistics (ACL’07). 720--727.
|
| |
16
|
|
| |
17
|
Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proceedings of the European Association of Computer Linguistics (EACL’06). 33--40.
|
| |
18
|
Moore, R. and Quirk, C. 2007. Faster beam-search decoding for phrasal statistical machine translation. In Proceedings of the Machine Translation Summit XI (MT’07).
|
| |
19
|
Nguyen, P., Gao J., and Mahajan, M. 2007. MSRLM: A scalable language modeling toolkit. Microsoft Research Tech. Rep. MSR-TR-2007-144.
|
| |
20
|
NIST. 2008. The 2008 NIST open machine translation evaluation. http://www.nist.gov/speech/tests/mt/2008/doc/.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Rosti, A.-V. I., Xiang, B., Matsoukas, S., Schwartz, R., Ayan, N. F., and Dorr, B. J. 2007a. Combining outputs from multiple machine translation systems. In Proceedings of Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 228--235.
|
| |
26
|
Rosti, A.-V. I., Matsoukas, S., and Schwartz, R. 2007b. Improved word-level system combination for machine translation. In Proceedings of the Association for Computer Linguistics (ACL’07). 312--319.
|
| |
27
|
Rosti, A.-V.I., Zhang, B., Matsoukas, S., and Schwartz, R. 2008. Incremental hypothesis alignment for building confusion networks with application to machine translation system combination. In Proceedings of the 3rd ACL Workshop on Statistical Machine Translation (SMT’08). 183--186.
|
| |
28
|
Shen, L., Xu, J., and Weischedel, R. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the Association for Computer Linguistics/Human Language Translation (ACL-HLT’08). 577--585.
|
| |
29
|
Sim, K. C., Byrne, W. J., Gales, M. J. F., Sahbi, H., and Woodland, P. C. 2007. Consensus network decoding for statistical machine translation system combination. In Proceedings of the International on Acoustics, Speech, and Signal Processing (ICASSP’07). 4, 105--108.
|
| |
30
|
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas (AMTA’06).
|
| |
31
|
Toutanova, K., Suzuki, H., and Ruopp, A. 2008. Applying morphology generation models to machine translation. In Proceedings of the Association of Computer Linguistics (ACL’08). 514--522.
|
| |
32
|
|
| |
33
|
Wang, C., Collins, M., and Koehn, P. 2007a. Chinese syntactic reordering for statistical machine translation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 737--745.
|
| |
34
|
Wang, W., Stolcke, A., and Zheng, J. 2007b. Reranking machine translation hypotheses with structured and Web-based language models. In Proceedings of the IEEE Conference on Automatic Speech Recognition and Understanding (ASRU’07). 159--164.
|
| |
35
|
|
|