ACM Home Page
Please provide us with feedback. Feedback
Matrix updates for perceptron training of continuous density hidden Markov models
Full text PdfPdf (698 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 153-160  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Chih-Chieh Cheng  University of California, San Diego
Fei Sha  University of Southern California
Lawrence K. Saul  University of California, San Diego
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 29,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553394
What is a DOI?

ABSTRACT

In this paper, we investigate a simple, mistake-driven learning algorithm for discriminative training of continuous density hidden Markov models (CD-HMMs). Most CD-HMMs for automatic speech recognition use multivariate Gaussian emission densities (or mixtures thereof) parameterized in terms of their means and covariance matrices. For discriminative training of CD-HMMs, we reparameterize these Gaussian distributions in terms of positive semidefinite matrices that jointly encode their mean and covariance statistics. We show how to explore the resulting parameter space in CDHMMs with perceptron-style updates that minimize the distance between Viterbi decodings and target transcriptions. We experiment with several forms of updates, systematically comparing the effects of different matrix factorizations, initializations, and averaging schemes on phone accuracies and convergence rates. We present experimental results for context-independent CD-HMMs trained in this way on the TIMIT speech corpus. Our results show that certain types of perceptron training yield consistently significant and rapid reductions in phone error rates.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bahl, L. R., Brown, P. F., de Souza, P. V., & Mercer, R. L. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. Proc. of International Conference of Acoustic, Speech and Signal Processing (ICASSP) (pp. 49--52). Tokyo.
 
2
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In Advances in neural information processing systems, vol. 20, 161--168. MIT Press.
 
3
Bottou, L., & LeCun, Y. (2004). Large scale online learning. In Advances in neural information processing systems 16. Cambridge, MA: MIT Press.
 
4
 
5
 
6
 
7
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing. Prentice-Hall.
 
8
Jiang, H., Li, X., & Liu, C. (2006). Large margin hidden markov models for speech recognition. IEEE Trans. on Audio, Speech and Language Processing, 14, 1584--1595.
 
9
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification. IEEE Trans. Sig. Proc., 40, 3043--3054.
 
10
Lamel, L. F., Kassel, R. H., & Seneff, S. (1986). Speech database development: design and analysis of the acoustic-phonetic corpus. Proceedings of the DARPA Speech Recognition Workshop (pp. 100--109).
 
11
Lee, K. F., & Hon, H. W. (1988). Speaker-independent phone recognition using hidden markov models. IEEE Trans. on Acoustics, Speech, and Signal Processing, 37, 1641--1648.
 
12
Li, J., Yuan, M., & Lee, C. (2007). Approximate test risk bound minimization through soft margin estimation. IEEE Trans. on Speech, Audio and Language Processing, 15, 2392--2404.
 
13
Nádas, A. (1983). A decision-theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing, 31, 814--817.
 
14
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65.
 
15
Sha, F., & Saul, L. K. (2009). Large margin training of continuous density hidden markov models. In J. Keshet and S. Bengio (Eds.), Automatic speech and speaker recognition: Large margin and kernel methods. Wiley-Blackwell.
 
16
Woodland, P. C., & Povey, D. (2000). Large scale discriminative training for speech recognition. Proc. of Automatic Speech Recognition (ASR2000).
 
17
Yu, D., Deng, L., He, X., & Acero, A. (2007). Largemargin minimum classification error training for large-scale speech recognition tasks. Prof. of International Conference on Acoustic, Speech and Signal Processing.

Collaborative Colleagues:
Chih-Chieh Cheng: colleagues
Fei Sha: colleagues
Lawrence K. Saul: colleagues