ACM Home Page
Please provide us with feedback. Feedback
Translation-invariant mixture models for curve clustering
Full text PdfPdf (689 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
SESSION: Research track table of contents
Pages: 79 - 88  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Darya Chudova  University of California, Irvine, CA
Scott Gaffney  University of California, Irvine, CA
Eric Mjolsness  University of California, Irvine, CA
Padhraic Smyth  University of California, Irvine, CA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 83,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956763
What is a DOI?

ABSTRACT

In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach uses the Expectation-Maximization (EM) algorithm to recover both the mean curve shapes for each cluster, and the most likely shifts, offsets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power and within-cluster variance on test data sets. The proposed approach provides a non-parametric, computationally efficient, and robust methodology for clustering broad classes of curve data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Aach and G. Church. Aligning gene expression time series with time warping algorithms. BIOINF: Bioinformatics, 17, 2001.
 
2
J. D. Banfield and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803--821, 1993.
3
 
4
R. Blender, K. Fraedrich, and F. Lunkeit. Identification of cyclone-track regimes in the North Atlantic. Quart J. Royal Meteor. Soc., 123:727--741, 1997.
 
5
D. Chudova, S. Gaffney, and P. Smyth. Probabilistic models for joint clustering and time-warping of multi-dimensional curves. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI-2003), San Francisco, CA, 2003. Morgan Kaufmann.
 
6
D. Chudova, S. Gaffney, E. Mjolsness, and P. Smyth. Mixture models for translation-invariant clustering of sets of multi-dimensional curves. Technical Report TR-03-09, School of Computer Science, University of California, Irvine, 2003.
 
7
 
8
 
9
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B, 39:1--38, 1977.
 
10
W. S. DeSarbo and W. L. Cron. A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(1):249--282, 1988.
 
11
J. Diebolt and C. P. Robert. Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society B, 2:363--375, 1994.
 
12
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Science, 95(25):14863--68, 1998.
 
13
C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458):611--631, 2002.
 
14
15
 
16
S. J. Gaffney, A. Robertson, and P. Smyth. Clustering of extra-tropical cyclone trajectories using mixtures of regression models. In Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Fourth Workshop on Mining Scientific Datasets, 2001.
 
17
S. J. Gaffney and P. Smyth. Curve clustering with random effects regression mixtures. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West FL, January 3--6 2003.
 
18
S. Gold, A Rangarajan, C.-P. Lu, S. Pappu, and E. Mjolsness. New algorithms for 2D and 3D point matching: pose estimation and correspondence. Pattern Recognition, 31(8):1019--1031, 1998.
 
19
Steven Gold , Anand Rangarajan , Eric Mjolsness, Learning with preknowledge: clustering with point and graph matching distance measures, Neural Computation, v.8 n.4, p.787-804, May 1996
 
20
J. A. Hartigan and M. A. Wong. Algorithm AS 136: a K-means clustering algorithm. Appl. Stat., 28:100--108, 1978.
 
21
G. M. James and C. A. Sugar. Clustering for sparsely sampled functional data. Journal of the American Statistical Association, to appear, 2003.
 
22
P. N. Jones and G. J. McLachlan. Fitting finite mixture models in a regression context. Austral. J. Statist., 34(2):233--240, 1992.
 
23
A. Kneip and T. Gasser. Statistical tools to analyze data representing a sample of curves. Annals of Statistics, 20(3):1266--1305, 1992.
 
24
N. M. Laird and J. H. Ware. Random effects models for longitudinal data. Biometrics, 38:963--974, 1982.
 
25
P. J. Lenk and W. S. DeSarbo. Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika, 65(1):93--119, 2000.
 
26
J. S. Liu. Monte Carlo Strategies in Scientific Computing. Stats. Springer, 2001.
 
27
J. Mateos, A. Katsaggelos, and R. Molina. A Bayesian approach for the estimation and transmission of regularization parameters for reducing blocking artifacts. IEEE Transactions on Medical Imaging, 9(7):1200--1215, July 2000.
 
28
G. J. McLachlan and K. E. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, 1988.
 
29
G. J. McLachlan, R. W. Bean, and D. Peel. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18(3):413--422, March 2002.
 
30
E. U. Mumcuoglu, R. M. Leahy, and S. R. Cherry. Bayesian reconstruction of PET images: methodology and performance analysis. Phys. Med. Biol., 41:1777--1807, 1996.
 
31
 
32
J. O. Ramsay and X. Li. Curve registration. J. Royal Stat. Soc. B, 60:351--363, 1998.
 
33
J. O. Ramsay and B. W. Silverman. Functional Data Analysis. Springer-Verlag, New York, NY, 1997.
 
34
S. Richardson and P. J. Green. On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, Series B, 59(4):731--792, 1997.
 
35
 
36
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molec. Biol. Cell, 9(12):3273--3297, December 1998.
 
37
K. Wang and T. Gasser. Alignment of curves by dynamic time warping. Annals of Statistics, 25(3):1251--1276, 1997.
 
38
K. Wang and T. Gasser. Synchronizing sample curves nonparametrically. Annals of Statistics, 27:439--460, 1999.
 
39
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977--987, 2001.

CITED BY  8

Collaborative Colleagues:
Darya Chudova: colleagues
Scott Gaffney: colleagues
Eric Mjolsness: colleagues
Padhraic Smyth: colleagues