|
ABSTRACT
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach uses the Expectation-Maximization (EM) algorithm to recover both the mean curve shapes for each cluster, and the most likely shifts, offsets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power and within-cluster variance on test data sets. The proposed approach provides a non-parametric, computationally efficient, and robust methodology for clustering broad classes of curve data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Aach and G. Church. Aligning gene expression time series with time warping algorithms. BIOINF: Bioinformatics, 17, 2001.
|
| |
2
|
J. D. Banfield and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803--821, 1993.
|
 |
3
|
Ziv Bar-Joseph , Georg Gerber , David K. Gifford , Tommi S. Jaakkola , Itamar Simon, A new approach to analyzing gene expression time series data, Proceedings of the sixth annual international conference on Computational biology, p.39-48, April 18-21, 2002, Washington, DC, USA
[doi> 10.1145/565196.565202]
|
| |
4
|
R. Blender, K. Fraedrich, and F. Lunkeit. Identification of cyclone-track regimes in the North Atlantic. Quart J. Royal Meteor. Soc., 123:727--741, 1997.
|
| |
5
|
D. Chudova, S. Gaffney, and P. Smyth. Probabilistic models for joint clustering and time-warping of multi-dimensional curves. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI-2003), San Francisco, CA, 2003. Morgan Kaufmann.
|
| |
6
|
D. Chudova, S. Gaffney, E. Mjolsness, and P. Smyth. Mixture models for translation-invariant clustering of sets of multi-dimensional curves. Technical Report TR-03-09, School of Computer Science, University of California, Irvine, 2003.
|
| |
7
|
|
| |
8
|
Robert G. Cowell , Steffen L. Lauritzen , A. Philip David , David J. Spiegelhalter , V. Nair , J. Lawless , M. Jordan , David J. Spiegelhater, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, 1999
|
| |
9
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B, 39:1--38, 1977.
|
| |
10
|
W. S. DeSarbo and W. L. Cron. A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(1):249--282, 1988.
|
| |
11
|
J. Diebolt and C. P. Robert. Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society B, 2:363--375, 1994.
|
| |
12
|
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Science, 95(25):14863--68, 1998.
|
| |
13
|
C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458):611--631, 2002.
|
| |
14
|
|
 |
15
|
|
| |
16
|
S. J. Gaffney, A. Robertson, and P. Smyth. Clustering of extra-tropical cyclone trajectories using mixtures of regression models. In Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Fourth Workshop on Mining Scientific Datasets, 2001.
|
| |
17
|
S. J. Gaffney and P. Smyth. Curve clustering with random effects regression mixtures. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West FL, January 3--6 2003.
|
| |
18
|
S. Gold, A Rangarajan, C.-P. Lu, S. Pappu, and E. Mjolsness. New algorithms for 2D and 3D point matching: pose estimation and correspondence. Pattern Recognition, 31(8):1019--1031, 1998.
|
| |
19
|
Steven Gold , Anand Rangarajan , Eric Mjolsness, Learning with preknowledge: clustering with point and graph matching distance measures, Neural Computation, v.8 n.4, p.787-804, May 1996
|
| |
20
|
J. A. Hartigan and M. A. Wong. Algorithm AS 136: a K-means clustering algorithm. Appl. Stat., 28:100--108, 1978.
|
| |
21
|
G. M. James and C. A. Sugar. Clustering for sparsely sampled functional data. Journal of the American Statistical Association, to appear, 2003.
|
| |
22
|
P. N. Jones and G. J. McLachlan. Fitting finite mixture models in a regression context. Austral. J. Statist., 34(2):233--240, 1992.
|
| |
23
|
A. Kneip and T. Gasser. Statistical tools to analyze data representing a sample of curves. Annals of Statistics, 20(3):1266--1305, 1992.
|
| |
24
|
N. M. Laird and J. H. Ware. Random effects models for longitudinal data. Biometrics, 38:963--974, 1982.
|
| |
25
|
P. J. Lenk and W. S. DeSarbo. Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika, 65(1):93--119, 2000.
|
| |
26
|
J. S. Liu. Monte Carlo Strategies in Scientific Computing. Stats. Springer, 2001.
|
| |
27
|
J. Mateos, A. Katsaggelos, and R. Molina. A Bayesian approach for the estimation and transmission of regularization parameters for reducing blocking artifacts. IEEE Transactions on Medical Imaging, 9(7):1200--1215, July 2000.
|
| |
28
|
G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, 1988.
|
| |
29
|
G. J. McLachlan, R. W. Bean, and D. Peel. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18(3):413--422, March 2002.
|
| |
30
|
E. U. Mumcuoglu, R. M. Leahy, and S. R. Cherry. Bayesian reconstruction of PET images: methodology and performance analysis. Phys. Med. Biol., 41:1777--1807, 1996.
|
| |
31
|
|
| |
32
|
J. O. Ramsay and X. Li. Curve registration. J. Royal Stat. Soc. B, 60:351--363, 1998.
|
| |
33
|
J. O. Ramsay and B. W. Silverman. Functional Data Analysis. Springer-Verlag, New York, NY, 1997.
|
| |
34
|
S. Richardson and P. J. Green. On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, Series B, 59(4):731--792, 1997.
|
| |
35
|
|
| |
36
|
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molec. Biol. Cell, 9(12):3273--3297, December 1998.
|
| |
37
|
K. Wang and T. Gasser. Alignment of curves by dynamic time warping. Annals of Statistics, 25(3):1251--1276, 1997.
|
| |
38
|
K. Wang and T. Gasser. Synchronizing sample curves nonparametrically. Annals of Statistics, 27:439--460, 1999.
|
| |
39
|
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977--987, 2001.
|
CITED BY 8
|
|
|
|
|
|
|
|
Rohit Singh , Nathan Palmer , David Gifford , Bonnie Berger , Ziv Bar-Joseph, Active learning for sampling in time-series experiments with application to gene expression analysis, Proceedings of the 22nd international conference on Machine learning, p.832-839, August 07-11, 2005, Bonn, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michèle Sebag , Nicolas Tarrisson , Olivier Teytaud , Julien Lefevre , Sylvain Baillet, A multi-objective multi-modal optimization approach for mining stable spatio-temporal patterns, Proceedings of the 19th international joint conference on Artificial intelligence, p.859-864, July 30-August 05, 2005, Edinburgh, Scotland
|
|