|
ABSTRACT
In this paper, we investigate a simple, mistake-driven learning algorithm for discriminative training of continuous density hidden Markov models (CD-HMMs). Most CD-HMMs for automatic speech recognition use multivariate Gaussian emission densities (or mixtures thereof) parameterized in terms of their means and covariance matrices. For discriminative training of CD-HMMs, we reparameterize these Gaussian distributions in terms of positive semidefinite matrices that jointly encode their mean and covariance statistics. We show how to explore the resulting parameter space in CDHMMs with perceptron-style updates that minimize the distance between Viterbi decodings and target transcriptions. We experiment with several forms of updates, systematically comparing the effects of different matrix factorizations, initializations, and averaging schemes on phone accuracies and convergence rates. We present experimental results for context-independent CD-HMMs trained in this way on the TIMIT speech corpus. Our results show that certain types of perceptron training yield consistently significant and rapid reductions in phone error rates.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bahl, L. R., Brown, P. F., de Souza, P. V., & Mercer, R. L. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. Proc. of International Conference of Acoustic, Speech and Signal Processing (ICASSP) (pp. 49--52). Tokyo.
|
| |
2
|
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In Advances in neural information processing systems, vol. 20, 161--168. MIT Press.
|
| |
3
|
Bottou, L., & LeCun, Y. (2004). Large scale online learning. In Advances in neural information processing systems 16. Cambridge, MA: MIT Press.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing. Prentice-Hall.
|
| |
8
|
Jiang, H., Li, X., & Liu, C. (2006). Large margin hidden markov models for speech recognition. IEEE Trans. on Audio, Speech and Language Processing, 14, 1584--1595.
|
| |
9
|
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification. IEEE Trans. Sig. Proc., 40, 3043--3054.
|
| |
10
|
Lamel, L. F., Kassel, R. H., & Seneff, S. (1986). Speech database development: design and analysis of the acoustic-phonetic corpus. Proceedings of the DARPA Speech Recognition Workshop (pp. 100--109).
|
| |
11
|
Lee, K. F., & Hon, H. W. (1988). Speaker-independent phone recognition using hidden markov models. IEEE Trans. on Acoustics, Speech, and Signal Processing, 37, 1641--1648.
|
| |
12
|
Li, J., Yuan, M., & Lee, C. (2007). Approximate test risk bound minimization through soft margin estimation. IEEE Trans. on Speech, Audio and Language Processing, 15, 2392--2404.
|
| |
13
|
Nádas, A. (1983). A decision-theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing, 31, 814--817.
|
| |
14
|
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65.
|
| |
15
|
Sha, F., & Saul, L. K. (2009). Large margin training of continuous density hidden markov models. In J. Keshet and S. Bengio (Eds.), Automatic speech and speaker recognition: Large margin and kernel methods. Wiley-Blackwell.
|
| |
16
|
Woodland, P. C., & Povey, D. (2000). Large scale discriminative training for speech recognition. Proc. of Automatic Speech Recognition (ASR2000).
|
| |
17
|
Yu, D., Deng, L., He, X., & Acero, A. (2007). Largemargin minimum classification error training for large-scale speech recognition tasks. Prof. of International Conference on Acoustic, Speech and Signal Processing.
|
|