|
ABSTRACT
For Hidden Markov Models (HMMs) with fully connected transition models, the three fundamental problems of evaluating the likelihood of an observation sequence, estimating an optimal state sequence for the observations, and learning the model parameters, all have quadratic time complexity in the number of states. We introduce a novel class of non-sparse Markov transition matrices called Dense-Mostly-Constant (DMC) transition matrices that allow us to derive new algorithms for solving the basic HMM problems in sub-quadratic time. We describe the DMC HMM model and algorithms and attempt to convey some intuition for their usage. Empirical results for these algorithms show dramatic speedups for all three problems. In terms of accuracy, the DMC model yields strong results and outperforms the baseline algorithms even in domains known to violate the DMC assumption.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continouous speech recognition. IEEE Trans Pattern Anal Machine Intell. (pp. 179--190).
|
| |
2
|
Baum, L. (1972). An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1--8.
|
| |
3
|
|
| |
4
|
El-Difrawy, P. B. S., & Ehrlich, D. (2002). Hidden Markov Models for DNA Sequencing. Proceedings of Workshop on Genomic Signal Processing and Statistics (GENSIPS).
|
| |
5
|
Felzenszwalb, P., Huttenlocher, D., & Kleinberg, J. (2003). Fast Algorithms for Large State Space HMMs with Applications to Web Usage Analysis. Advances in Neural Information Processing Systems (NIPS).
|
| |
6
|
Murphy, K., & Paskin, M. (2002). Lincar Time Inference in Hierarchical HMMs. Advances in Neural Information Processing Systems (NIPS).
|
| |
7
|
Rabiner, L. R. (1989). A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE, 77, 257--285.
|
| |
8
|
Salakhutdinov, R., Roweis, S., & Ghahramani, Z. (2003). Optimization with EM and Expectation-Conjugate-Gradient. Proceedings, Intl. Conf. on Machine Learning (ICML).
|
| |
9
|
Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI'99 Workshop on Machine Learning for Information Extraction.
|
| |
10
|
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, IT-13, 260 267.
|
|