ACM Home Page
Please provide us with feedback. Feedback
When a mismatch can be good: large vocabulary speech recognition trained with idealized tandem features
Full text PdfPdf (105 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2008 ACM symposium on Applied computing table of contents
Fortaleza, Ceara, Brazil
SESSION: Natural language processing and speech recognition table of contents
Pages 1574-1577  
Year of Publication: 2008
ISBN:978-1-59593-753-7
Authors
Arlo Faria  University of California at Berkeley, Berkeley, CA
Nelson Morgan  International Computer Science Institute, Berkeley, CA
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 21,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1363686.1364055
What is a DOI?

ABSTRACT

This paper explores Tandem feature extraction used in a large-vocabulary speech recognition system. In this framework a multi-layer perceptron estimates phone probabilities which are treated as acoustic observations in a traditional HMM-GMM system. To determine a lower error bound, we simulated an idealized classifier based on alignment of reference transcriptions. This cheating experiment demonstrated a best-case scenario for Tandem feature extraction, highlighting the potential for dramatic system improvement. More importantly, we discovered a way to exploit the result without cheating: using the simulated classifier during training and a MLP classifier at test, the performance improved despite the mismatched Tandem features.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Hermansky, D. Ellis, and S. Sharma. Tandem connectionist feature extraction for conventional HMM systems. Proc. ICASSP, 2000.
 
2
M. Hwang, X. Lei, W. Wang, and T. Shinozaki. Investigation on Mandarin Broadcast News Speech Recognition. Proc. Interspeech, 2006.
 
3
X. Lei, M. Siu, M.-Y. Hwang, M. Ostendorf, and T. Lee. Improved Tone Modeling for Mandarin Broadcast News Speech Recognition. Proc. Interspeech, 2006.
 
4
G. Peng, M.-Y. Hwang, and M. Ostendorf. Automatic acoustic segmentation for speech recognition on broadcast recordings. Proc. Interspeech, 2007.
 
5
D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig. fMPE: Discriminatively Trained Features for Speech Recognition. Proc. ICASSP, 2005.
 
6
D. Povey and P. Woodland. Minimum phone error and I-smoothing for improved discriminative training. Proc. ICASSP, 2002.
 
7
J. Zheng, O. Cetin, M.-Y. Hwang, X. Lei, A. Stolcke, and N. Morgan. Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition. Proc. ICASSP, 2007.
 
8
Q. Zhu, A. Stolcke, B. Chen, and N. Morgan. Using MLP features in SRI's conversational speech recognition system. Proc. Interspeech, 2005.


Collaborative Colleagues:
Arlo Faria: colleagues
Nelson Morgan: colleagues