ACM Home Page
Please provide us with feedback. Feedback
Clustering time series from ARMA models with clipped data
Full text PdfPdf (306 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 49 - 58  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
A. J. Bagnall  University of East Anglia, Norwich, England
G. J. Janacek  University of East Anglia, Norwich, England
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 143,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014061
What is a DOI?

ABSTRACT

Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. In this paper we focus on clustering data derived from Autoregressive Moving Average (ARMA) models using k-means and k-medoids algorithms with the Euclidean distance between estimated model parameters. We justify our choice of clustering technique and distance metric by reproducing results obtained in related research. Our research aim is to assess the affects of discretising data into binary sequences of above and below the median, a process known as clipping, on the clustering of time series. It is known that the fitted AR parameters of clipped data tend asymptotically to the parameters for unclipped data. We exploit this result to demonstrate that for long series the clustering accuracy when using clipped data from the class of ARMA models is not significantly different to that achieved with unclipped data. Next we show that if the data contains outliers then using clipped data produces significantly better clusterings. We then demonstrate that using clipped series requires much less memory and operations such as distance calculations can be much faster. Finally, we demonstrate these advantages on three real world data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Akaike. Likelihood of a model and information criteria. Journal of Econometrics, 16:3--14, 1981.
 
2
A. J. Bagnall and G. Janacek. Clustering time series from ARMA models with clipped data. Technical Report CMP-C04-01, School of Computing Sciences, University of East Anglia, 2004.
 
3
A. J. Bagnall, G. Janacek, B. d. Iglesia, and M. Zhang. Clustering time series from mixture polynomial models with discretised data. In Proceedings of the second Australasian Data Mining Workshop, pages 105--120, 2003.
4
 
5
R. Blender, K. Fraedrich, and F. Lunkeit. Identification of cyclone-track regimes in the north atlantic. Quart J. Royal Meteor. Soc., (123):727--741, 1997.
 
6
P. Broerson and S. de Waele. Empirical time series and maximum likelihood estimation. In Proc 2nd IEEE Benelux Signal Processing Symposium, 2000.
 
7
J. P. Burg. Maximum entropy spectral analysis. presented at 37th meeting of the Society of Exploration Geophysicists, Oklahoma City, 1967.
8
 
9
 
10
E. Dermatas and G. Kokkinakis. Algorithm for clustering continuous density HMM by recognition error. IEEE Tr. On Speech and Audio Processing, 4(3):231--234, 1996.
 
11
 
12
S. Gaffney and P. Smyth. Curve clustering with random effects regression mixtures. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.
 
13
A. B. Geva and D. H. Kerem. Fuzzy and Neuro-Fuzzy Systems in Medicine, chapter 3. Brain state identification and forecasting of acute pathology using unsupervised fuzzy clustering of EEG temporal patterns. CRC Press, 1998.
 
14
E. J. Godolphin. A direct representation for the large-sample maximum likelihood estimator of a gaussian autoregressive-moving average process. Biometrika, 71(2):281--289, 1984.
 
15
 
16
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: Data mining, inference, and prediction. Springer-Verlag, 2001.
 
17
HiGEM. High Resolution Global Environment and Modelling. http://www.higem.nerc.ac.uk/index.php.
 
18
G. J. Janacek. Practical Time Series. Ellis Horwood, 2001.
 
19
K. Kalpakis. Distance measures for clustering time series. http://www.csee.umbc.edu/~kalpakis.
 
20
 
21
B. Kedem. Estimation of the parameters in stationary autoregressive processes after hard limiting. Journal of the American Statistical Association, 75:146--153, 1980.
 
22
B. Kedem and E. Slud. On goodness of fit of time series models: An application of higher order crossings. Biometrika, 68:551--556, 1991.
 
23
E. Keogh and T. Folias. The ucr time series data mining archive. http://www.cs.ucr.edu/~eamonn/TSDMA/.
 
24
 
25
K. Kosmelj and V. Batagelj. Cross-sectional approach for clustering time varying data. Journal of Classification, 7:99--109, 1990.
26
 
27
E. A. Maharaj. A significance test for classifying ARMA models. Journal of Statistical Computation and Simulation, 54:305--331,1996.
 
28
E. A. Maharaj. Clusters of time series. Journal of Classification, 17:297--314, 2000.
 
29
 
30
P. Ormerod and C. Mounfield. Localised structures in the temporal evolution of asset prices. In New Approaches to Financial Economics. Santa Fe Conference, 2000.
 
31
D. K. Pauler. The Schwarz criterion and related methods for normal linear models. Biometrika,85(1):13--27,1998.
 
32
D. Piccolo. A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11(2):153--164, 1990.
 
33
 
34
P. Smyth. Clustering sequences with hidden markov models. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 648. The MIT Press, 1997.
 
35
P. Tong and H. Dabas. Cluster of time series models: An example. Journal of Applied Statistics, 17:187--198, 1990.
 
36
 
37


Collaborative Colleagues:
A. J. Bagnall: colleagues
G. J. Janacek: colleagues