| When a mismatch can be good: large vocabulary speech recognition trained with idealized tandem features |
| Full text |
Pdf
(105 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2008 ACM symposium on Applied computing
table of contents
Fortaleza, Ceara, Brazil
SESSION: Natural language processing and speech recognition
table of contents
Pages 1574-1577
Year of Publication: 2008
ISBN:978-1-59593-753-7
|
|
Authors
|
|
Arlo Faria
|
University of California at Berkeley, Berkeley, CA
|
|
Nelson Morgan
|
International Computer Science Institute, Berkeley, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 21, Citation Count: 1
|
|
|
ABSTRACT
This paper explores Tandem feature extraction used in a large-vocabulary speech recognition system. In this framework a multi-layer perceptron estimates phone probabilities which are treated as acoustic observations in a traditional HMM-GMM system. To determine a lower error bound, we simulated an idealized classifier based on alignment of reference transcriptions. This cheating experiment demonstrated a best-case scenario for Tandem feature extraction, highlighting the potential for dramatic system improvement. More importantly, we discovered a way to exploit the result without cheating: using the simulated classifier during training and a MLP classifier at test, the performance improved despite the mismatched Tandem features.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
H. Hermansky, D. Ellis, and S. Sharma. Tandem connectionist feature extraction for conventional HMM systems. Proc. ICASSP, 2000.
|
| |
2
|
M. Hwang, X. Lei, W. Wang, and T. Shinozaki. Investigation on Mandarin Broadcast News Speech Recognition. Proc. Interspeech, 2006.
|
| |
3
|
X. Lei, M. Siu, M.-Y. Hwang, M. Ostendorf, and T. Lee. Improved Tone Modeling for Mandarin Broadcast News Speech Recognition. Proc. Interspeech, 2006.
|
| |
4
|
G. Peng, M.-Y. Hwang, and M. Ostendorf. Automatic acoustic segmentation for speech recognition on broadcast recordings. Proc. Interspeech, 2007.
|
| |
5
|
D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig. fMPE: Discriminatively Trained Features for Speech Recognition. Proc. ICASSP, 2005.
|
| |
6
|
D. Povey and P. Woodland. Minimum phone error and I-smoothing for improved discriminative training. Proc. ICASSP, 2002.
|
| |
7
|
J. Zheng, O. Cetin, M.-Y. Hwang, X. Lei, A. Stolcke, and N. Morgan. Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition. Proc. ICASSP, 2007.
|
| |
8
|
Q. Zhu, A. Stolcke, B. Chen, and N. Morgan. Using MLP features in SRI's conversational speech recognition system. Proc. Interspeech, 2005.
|
CITED BY
|
|
Mei-Yuh Hwang , Gang Peng , Mari Ostendorf , Wen Wang , Arlo Faria , Aaron Heidel, Building a highly accurate Mandarin speech recognizer with language-independent technologies and language-dependent modules, IEEE Transactions on Audio, Speech, and Language Processing, v.17 n.7, p.1253-1262, September 2009
|
|