| Lexical triggers and latent semantic analysis for cross-lingual language model adaptation |
| Full text |
Pdf
(256 KB)
|
| Source
|
ACM Transactions on Asian Language Information Processing (TALIP)
archive
Volume 3 , Issue 2 (June 2004)
table of contents
Pages: 94 - 112
Year of Publication: 2004
ISSN:1530-0226
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 41, Citation Count: 0
|
|
|
ABSTRACT
In-domain texts for estimating statistical language models are not easily found for most languages of the world. We present two techniques to take advantage of in-domain text resources in other languages. First, we extend the notion of <i>lexical triggers</i>, which have been used monolingually for language model adaptation, to the cross-lingual problem, permitting the construction of sharper language models for a target-language document by drawing statistics from related documents in a resource-rich language. Next, we show that <i>cross-lingual latent semantic analysis</i> is similarly capable of extracting useful statistics for language modeling. Neither technique requires explicit translation capabilities between the two languages! We demonstrate significant reductions in both perplexity and word error rate on a Mandarin speech recognition task by using these techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Byrne, W. et al. 2000. Towards language independent acoustic modeling. In Proceedings of the ICASSP, vol. 2. 1029--1032.
|
| |
4
|
Coccaro, N. and Jurafsky, D. 1998. Towards better integration of semantic predictors in statistical language modeling. In Proceedings of the ICSLP, Sydney, Australia, vol. 6. 2403--2406.
|
| |
5
|
Doermann, D. et al. 2002. Lexicon acquisition from bilingual dictionaries. In Proceedings of the SPIE Photonic West Article Imaging Conference, San Jose, CA. 37--48.
|
| |
6
|
Dumais, S. et al. 1997. Automatic cross-language retrieval using latent semantic indexing. In AAAI Spring Symposium on Cross-Language Text and Speech Retrieval.
|
| |
7
|
Fung, P. et al. 2000. Pronunciation modeling of Mandarin casual speech. 2000 Johns Hopkins Summer Workshop. Available at http://www.clsp.jhu.edu/ws2000/groups/mcs.
|
| |
8
|
Iyer, R. and Ostendorf, M. 1999. Modeling long-distance dependence in language: topic-mixtures vs dynamic cache models. IEEE Trans. Speech Audio Process. 7, 30--39.
|
| |
9
|
Khudanpur, S. and Kim, W. 2002. Using cross-language cues for story-specific language modeling. In Proceedings of the ICSLP, Denver, CO, vol. 1. 513--516.
|
| |
10
|
|
| |
11
|
Kirchhoff, K. et al. 2002. Novel speech recognition models for Arabic. 2002 Johns Hopkins Summer Workshop. Available at http://www.clsp.jhu.edu/ws2002/groups/arabic.
|
| |
12
|
LDC. 2000. Hong Kong news parallel text corpus. Available through the Linguistic Data Consortium. http://www.ldc.upenn.edu/Catalog/LDC2000T46.html.
|
| |
13
|
|
| |
14
|
Pallett, D., Fisher, W., and Fiscus, J. 1990. Tools for the analysis of benchmark speech recognition tests. In Proceedings of the ICASSP, Alburquerque, NM, vol. 1. 97--100.
|
| |
15
|
Rosenfeld, R. 1996. A maximum entropy approach to adaptive statistical language modeling. Comput. Speech Lang. 10, 187--228.
|
| |
16
|
Schultz, T. and Waibel, A. 1998. Language independent and language adaptive large vocabulary speech recognition. In Proceedings of the ICSLP, Sydney, Australia, vol. 5. 1819--1822.
|
| |
17
|
Tillmann, C. and Ney, H. 1997. Word trigger and the EM algorithm. In Proceedings of the Workshop Computational Natural Language Learning (CoNLL 97), Madrid, Spain. 117--124.
|
| |
18
|
|
|