ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Cross-lingual lexical triggers in statistical language modeling
Full text Publisher SitePublisher Site PdfPdf (133 KB)
Source Theoretical Issues In Natural Language Processing archive
Proceedings of the 2003 conference on Empirical methods in natural language processing - Volume 10 table of contents
Pages: 17 - 24  
Year of Publication: 2003
Authors
Woosung Kim  The Johns Hopkins University, Baltimore, MD
Sanjeev Khudanpur  The Johns Hopkins University, Baltimore, MD
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 14,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1119355.1119358

ABSTRACT

We propose new methods to take advantage of text in resource-rich languages to sharpen statistical language models in resource-deficient languages. We achieve this through an extension of the method of lexical triggers to the cross-language problem, and by developing a likelihood-based adaptation scheme for combining a trigger model with an N-gram model. We describe the application of such language models for automatic speech recognition. By exploiting a side-corpus of contemporaneous English news articles for adapting a static Chinese language model to transcribe Mandarin news stories, we demonstrate significant reductions in both perplexity and recognition errors. We also compare our cross-lingual adaptation scheme to monolingual language model adaptation, and to an alternate method for exploiting cross-lingual cues, via cross-lingual information retrieval and machine translation, proposed elsewhere.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
W. Byrne, P. Beyerlein, J. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone, D. Vergyri, and W. Wang. 2000. Towards language independent acoustic modeling. In Proc. ICASSP, volume 2, pages 1029--1032.
 
3
P. Fung et al. 2000. Pronunciation modeling of mandarin casual speech. 2000 Johns Hopkins Summer Workshop.
 
4
D. Doermann et al. 2002. Lexicon acquisition from bilingual dictionaries. In Proc. SPIE Photonic West Article Imaging Conference, pages 37--48, San Jose, CA.
 
5
R. Iyer and M. Ostendorf. 1999. Modeling long-distance dependence in language: topic-mixtures vs dynamic cache models. IEEE Transactions on Speech and Audio Processing, 7:30--39.
 
6
S. Khudanpur and W. Kim. 2002. Using cross-language cues for story-specific language modeling. In Proc. ICSLP, volume 1, pages 513--516, Denver, CO.
 
7
 
8
D. Pallett, W. Fisher, and J. Fiscus. 1990. Tools for the analysis of benchmark speech recognition tests. In Proc. ICASSP, volume 1, pages 97--100, Alburquerque, NM.
 
9
R. Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10:187--228.
 
10
T. Schultz and A. Waibel. 1998. Language independent and language adaptive large vocabulary speech recognition. In Proc. ICSLP, volume 5, pages 1819--1822, Sydney, Australia.
 
11
C. Tillmann and H. Ney. 1997. Word trigger and the em algorithm. In Proceedings of the Workshop Computational Natural Language Learning (CoNLL 97), pages 117--124, Madrid, Spain.
 
12

Collaborative Colleagues:
Woosung Kim: colleagues
Sanjeev Khudanpur: colleagues