|
ABSTRACT
A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden Markov model (HMM), for which one often must choose an appropriate model structure (number of states) before learning. Here we model sequential data from each task with an infinite hidden Markov model (iHMM), avoiding the problem of model selection. The MTL for iHMMs is implemented by imposing a nested Dirichlet process (nDP) prior on the base distributions of the iHMMs. The nDP-iHMM MTL method allows us to perform task-level clustering and data-level clustering simultaneously, with which the learning for individual iHMMs is enhanced and between-task similarities are learned. Learning and inference for the nDP-iHMM MTL are based on a Gibbs sampler. The effectiveness of the framework is demonstrated using synthetic data as well as real music data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aucouturier, J. J., & Pachet, F. (2002). Music similarity measures: Whats the use? International Symposium on Music Information Retrieval (ISMIR).
|
| |
2
|
Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
|
| |
3
|
Beal, M. J., Ghahramani, Z., & Rasmussen, C. (2002). The infinite Hidden markov model. Neural Information Processing Systems.
|
| |
4
|
Blei, D. M., Griffiths, T. L., Jordan, M. I., & Tenenbaum, J. B. (2004). Hierarchical topic models and the nested Chinese restaurant process. Neural Information Processing Systems.
|
| |
5
|
|
| |
6
|
Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577--588.
|
| |
7
|
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209--230.
|
| |
8
|
Gelman, A., Carlin, J. B., Stern, H. S., & Rubim, D. B. (Eds.). (1995). Bayesian data analysis. Chapman and Hall.
|
| |
9
|
Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96, 161--173.
|
| |
10
|
Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. IEEE Trans. Communications, COM-28, 84--95.
|
| |
11
|
Logan, B., & Salomon, A. (2001). A music similarity function based on signal analysis. IEEE International Conference on Multimedia and Expo.
|
| |
12
|
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257--286.
|
| |
13
|
Raftery, A. E., & Lewis, S. M. (1992). How many iterations in the Gibbs sampler? Bayesian Statistics, 4, 763--773.
|
| |
14
|
Rasmussen, C. (2000). The infinite Gaussian mixture model. Neural Information Processing Systems.
|
| |
15
|
Rodriguez, A., Dunson, D. B., & Gelfang, A. E. (2006). The nested Dirichlet process. Journal of the American Statistical Association, submitted.
|
| |
16
|
Runkle, P., Bharadwaj, P. K., Couchman, L., & Carin, L. (1999). Hidden Markov models for multi-aspect target classification. IEEE Transactions on Signal Processing, 47, 2035--2040.
|
| |
17
|
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639--650.
|
| |
18
|
|
| |
19
|
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566--1581.
|
| |
20
|
Thurn, S., & O'Sullivan, J. (1996). Discovering structure in multiple learning tasks: The TC algorithm. The 13th International Conference on Machine Learning.
|
| |
21
|
|
|