ACM Home Page
Please provide us with feedback. Feedback
Improved Monolingual Hypothesis Alignment for Machine Translation System Combination
Full text PdfPdf (538 KB)
Source
ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 8 ,  Issue 2  (May 2009) table of contents
Article No. 6  
Year of Publication: 2009
ISSN:1530-0226
Authors
Xiaodong He  Microsoft Research
Mei Yang  University of Washington
Jianfeng Gao  Microsoft Research
Patrick Nguyen  Microsoft Research
Robert Moore  Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 76,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526252.1526254
What is a DOI?

ABSTRACT

This article presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. An indirect hidden Markov model (IHMM) is proposed to address the synonym matching and word ordering issues in hypothesis alignment. Unlike traditional HMMs whose parameters are trained via maximum likelihood estimation (MLE), the parameters of the IHMM are estimated indirectly from a variety of sources including word semantic similarity, word surface similarity, and a distance-based distortion penalty. The IHMM-based method significantly outperforms the state-of-the-art, TER-based alignment model in our experiments on NIST benchmark datasets. Our combined SMT system using the proposed method achieved the best Chinese-to-English translation result in the constrained training track of the 2008 NIST Open MT Evaluation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Andrew, G. 2006. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Proceedings (EMNLP’06).
 
2
Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Conference on Automatic Speech Recognition and Understanding (ASRU’01). 351--354.
 
3
Brent, R. 1973. Algorithms for minimization without derivatives. Prentice-Hall, Chapter 7.
 
4
 
5
 
6
Foster, G. and Kuhn, R. 2007. Mixture-model adaptation for SMT. In Proceedings of the 2nd ACL Workshop on Statistical Machine Translation (SML’07). 128--136.
 
7
 
8
 
9
He, X. 2007. Using word-dependent transition models in HMM based word alignment for statistical machine translation. In Proceedings of the 2nd ACL Workshop on Statistical Machine Translation (SMT’07).
 
10
Huang, L. and Chiang, D. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the Association on Computer Linguistics (ACL’07).
 
11
Jayaraman, S. and Lavie, A. 2005. Multi-engine machine translation guided by explicit word matching. In Proceedings of the European Association for Machine Translation (EAMT’05). 143--152.
 
12
Karakos, D., Eisner, J., Khudanpur, S., and Dreyer, M. 2008. Machine translation system combination using ITG-based alignments. In Proceedings of the Human Language Technology Conference (ACL-HLT’08). 81--84.
 
13
 
14
 
15
Li, C.-H., Li, M., Zhang, D., Li, M., Zhou, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Association on Computer Linguistics (ACL’07). 720--727.
 
16
 
17
Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proceedings of the European Association of Computer Linguistics (EACL’06). 33--40.
 
18
Moore, R. and Quirk, C. 2007. Faster beam-search decoding for phrasal statistical machine translation. In Proceedings of the Machine Translation Summit XI (MT’07).
 
19
Nguyen, P., Gao J., and Mahajan, M. 2007. MSRLM: A scalable language modeling toolkit. Microsoft Research Tech. Rep. MSR-TR-2007-144.
 
20
NIST. 2008. The 2008 NIST open machine translation evaluation. http://www.nist.gov/speech/tests/mt/2008/doc/.
 
21
 
22
 
23
 
24
 
25
Rosti, A.-V. I., Xiang, B., Matsoukas, S., Schwartz, R., Ayan, N. F., and Dorr, B. J. 2007a. Combining outputs from multiple machine translation systems. In Proceedings of Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 228--235.
 
26
Rosti, A.-V. I., Matsoukas, S., and Schwartz, R. 2007b. Improved word-level system combination for machine translation. In Proceedings of the Association for Computer Linguistics (ACL’07). 312--319.
 
27
Rosti, A.-V.I., Zhang, B., Matsoukas, S., and Schwartz, R. 2008. Incremental hypothesis alignment for building confusion networks with application to machine translation system combination. In Proceedings of the 3rd ACL Workshop on Statistical Machine Translation (SMT’08). 183--186.
 
28
Shen, L., Xu, J., and Weischedel, R. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the Association for Computer Linguistics/Human Language Translation (ACL-HLT’08). 577--585.
 
29
Sim, K. C., Byrne, W. J., Gales, M. J. F., Sahbi, H., and Woodland, P. C. 2007. Consensus network decoding for statistical machine translation system combination. In Proceedings of the International on Acoustics, Speech, and Signal Processing (ICASSP’07). 4, 105--108.
 
30
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas (AMTA’06).
 
31
Toutanova, K., Suzuki, H., and Ruopp, A. 2008. Applying morphology generation models to machine translation. In Proceedings of the Association of Computer Linguistics (ACL’08). 514--522.
 
32
 
33
Wang, C., Collins, M., and Koehn, P. 2007a. Chinese syntactic reordering for statistical machine translation. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 737--745.
 
34
Wang, W., Stolcke, A., and Zheng, J. 2007b. Reranking machine translation hypotheses with structured and Web-based language models. In Proceedings of the IEEE Conference on Automatic Speech Recognition and Understanding (ASRU’07). 159--164.
 
35

Collaborative Colleagues:
Xiaodong He: colleagues
Mei Yang: colleagues
Jianfeng Gao: colleagues
Patrick Nguyen: colleagues
Robert Moore: colleagues