ACM Home Page
Please provide us with feedback. Feedback
Clustering of streaming time series is meaningless
Full text PdfPdf (649 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
San Diego, California
SESSION: Data streams II table of contents
Pages: 56 - 65  
Year of Publication: 2003
Authors
Jessica Lin  University of California - Riverside, Riverside, CA
Eamonn Keogh  University of California - Riverside, Riverside, CA
Wagner Truppel  University of California - Riverside, Riverside, CA
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 128,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/882082.882096
What is a DOI?

ABSTRACT

Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it's own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention.In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature.We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
 
6
British Iris Society, Species Group Staff. (1997). A Guide to Species Irises: Their Identification and Cultivation. Cambridge University Press. March, 1997.
 
7
Cotofrei, P. (2002). Statistical Temporal Rules. In proceedings of the 15th Conference on Computational Statistics - Short Communications and Posters, Berlin, Germany, Aug 24--28.
 
8
 
9
Das, G., Lin, K., Mannila, H., Renganathan, G. amp; Smyth, P. (1998). Rule Discovery from Time Series. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 16--22.
 
10
Fisher, R. A. (1936). The Use of Multiple Measures in Taxonomic Problems. Annals of Eugenics. Vol. 7, No. 2, pp 179--188.
 
11
Fu, T. C., Chung, F. L., Ng, V. amp; Luk, R. (2001). Pattern Discovery from Stock Time Series Using Self-Organizing Maps. Workshop Notes of KDD2001 Workshop on Temporal Data Mining. San Francisco, CA, Aug 26--29. pp 27--37.
12
 
13
 
14
 
15
 
16
 
17
Hetland, M. L. amp; Sætrom, P. (2002). Temporal Rules Discovery Using Genetic Programming and Specialized Hardware. In proceedings of the 4th Int'l Conference on Recent Advances in Soft Computing. Nottingham, UK, Dec 12--13.
 
18
19
 
20
 
21
 
22
Kendall, M. (1976) Time-Series. 2nd Edition. Charles Griffin and Company, Ltd., London.
 
23
Keogh, E. (2002). The UCR Time Series Data Mining Archive {http://www.cs.ucr.edu/~eamonn/TSDMA/index.html}. Riverside CA. University of California - Computer Science amp; Engineering Department.
 
24
Keogh, E. (2002). Exact Indexing of Dynamic Time Warping. In proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, Aug 20--23. pp 406--417.
 
25
Keogh, E., Chakrabarti, K., Pazzani, M amp; Mehrotra, S. (2001). Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Journal of Knowledge and Information Systems. Vol. 3, No. 3, pp. 263--286.
26
27
 
28
Lin, J. Keogh, E. Patel, P. amp; Lonardi, S. (2002). Finding motifs in time series. In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 -- 26, 2002. Edmonton, Alberta, Canada.
 
29
Mantegna., R. N. (1999). Hierarchical Structure in Financial Markets. European. Physical Journal. B11, pp. 193--197.
 
30
Mori, T. amp; Uehara, K. (2001). Extraction of Primitive Motion and Discovery of Association Rules from Human Motion. In proceedings of the 10th IEEE Int'l Workshop on Robot and Human Communication, Bordeaux-Paris, France, Sept 18--21. pp 200--206.
31
 
32
 
33
Radhakrishnan, N., Wilson, J. D. amp; Loizou, P. C. (2000). An Alternate Partitioning Technique to Quantify the Regularity of Complex Time Series. International Journal of Bifurcation and Chaos, Vol. 10, No. 7. World Scientific Publishing. pp 1773--1779.
 
34
Reinert, G., Schbath, S. amp; Waterman, M. S. (2000). Probabilistic and statistical properties of words: An overview. J. Comput. Bio., Vol. 7, pp 1--46.
 
35
 
36
Sarker, B. K., Mori, T. amp; Uehara, K. (2002). Parallel Algorithms for Mining Association Rules in Time Series Data. CS24-2002-1 Tech report.
 
37
Schittenkopf, C., Tino, P. amp; Dorffner, G. (2000). The Benefit of Information Reduction for Trading Strategies. Report Series for Adaptive Information Systems and Management in Economics and Management Science, July. Report #45.
 
38
Steinback, M., Tan, P. N., Kumar, V., Klooster, S. amp; Potter, C. (2002). Temporal Data Mining for the Discovery and Analysis of Ocean Climate Indices. In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada. July 23.
 
39
Timmermann, A., Sullivan, R. amp; White, H. (1998). The Dangers of Data-Driven Inference: The Case of Calendar Effects in Stock Returns. FMG Discussion Papers dp0304, Financial Markets Group and ESRC.
 
40
Tino, P., Schittenkopf, C. amp; Dorffner, G. (2000). Temporal Pattern Recognition in Noisy Non-stationary Time Series Based on Quantization into Symbolic Streams: Lessons Learned from Financial Volatility Trading. Report Series for Adaptive Information Systems and Management in Economics and Management Science, July. Report #46.
 
41
Truppel, Keogh, Lin (2003). A Hidden Constraint When Clustering Streaming Time Series. UCR Tech Report.
 
42
 
43
 
44
Walker, J. (2001). HotBits: Genuine Random Numbers Generated by Radioactive Decay. <u>www.fourmilab.ch/hotbits/</u>
 
45
Yairi, T., Kato, Y. amp; Hori, K. (2001). Fault Detection by Mining Association Rules in House-keeping Data. In proceedings of the 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space. Montreal, Canada, June 18--21.


Collaborative Colleagues:
Jessica Lin: colleagues
Eamonn Keogh: colleagues
Wagner Truppel: colleagues