|
ABSTRACT
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it's own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention.In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature.We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
3
|
Ziv Bar-Joseph , Georg Gerber , David K. Gifford , Tommi S. Jaakkola , Itamar Simon, A new approach to analyzing gene expression time series data, Proceedings of the sixth annual international conference on Computational biology, p.39-48, April 18-21, 2002, Washington, DC, USA
[doi> 10.1145/565196.565202]
|
| |
4
|
|
| |
5
|
|
| |
6
|
British Iris Society, Species Group Staff. (1997). A Guide to Species Irises: Their Identification and Cultivation. Cambridge University Press. March, 1997.
|
| |
7
|
Cotofrei, P. (2002). Statistical Temporal Rules. In proceedings of the 15th Conference on Computational Statistics - Short Communications and Posters, Berlin, Germany, Aug 24--28.
|
| |
8
|
|
| |
9
|
Das, G., Lin, K., Mannila, H., Renganathan, G. amp; Smyth, P. (1998). Rule Discovery from Time Series. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 16--22.
|
| |
10
|
Fisher, R. A. (1936). The Use of Multiple Measures in Taxonomic Problems. Annals of Eugenics. Vol. 7, No. 2, pp 179--188.
|
| |
11
|
Fu, T. C., Chung, F. L., Ng, V. amp; Luk, R. (2001). Pattern Discovery from Stock Time Series Using Self-Organizing Maps. Workshop Notes of KDD2001 Workshop on Temporal Data Mining. San Francisco, CA, Aug 26--29. pp 27--37.
|
 |
12
|
Martin Gavrilov , Dragomir Anguelov , Piotr Indyk , Rajeev Motwani, Mining the stock market (extended abstract): which measure is best?, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.487-496, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347189]
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Sherri Harms , Dan Li , Jitender Deogun , Tsegaye Tadesse, Efficient rule discovery in a geo-spatial decision support system, Proceedings of the 2002 annual national conference on Digital government research, p.1-7, May 19-22, 2002, Los Angeles, California
|
| |
17
|
Hetland, M. L. amp; Sætrom, P. (2002). Temporal Rules Discovery Using Genetic Programming and Specialized Hardware. In proceedings of the 4th Int'l Conference on Recent Advances in Soft Computing. Nottingham, UK, Dec 12--13.
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Kendall, M. (1976) Time-Series. 2nd Edition. Charles Griffin and Company, Ltd., London.
|
| |
23
|
Keogh, E. (2002). The UCR Time Series Data Mining Archive {http://www.cs.ucr.edu/~eamonn/TSDMA/index.html}. Riverside CA. University of California - Computer Science amp; Engineering Department.
|
| |
24
|
Keogh, E. (2002). Exact Indexing of Dynamic Time Warping. In proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, Aug 20--23. pp 406--417.
|
| |
25
|
Keogh, E., Chakrabarti, K., Pazzani, M amp; Mehrotra, S. (2001). Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Journal of Knowledge and Information Systems. Vol. 3, No. 3, pp. 263--286.
|
 |
26
|
|
 |
27
|
Chung-Sheng Li , Philip S. Yu , Vittorio Castelli, MALM: a framework for mining sequence database at multiple abstraction levels, Proceedings of the seventh international conference on Information and knowledge management, p.267-272, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288666]
|
| |
28
|
Lin, J. Keogh, E. Patel, P. amp; Lonardi, S. (2002). Finding motifs in time series. In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 -- 26, 2002. Edmonton, Alberta, Canada.
|
| |
29
|
Mantegna., R. N. (1999). Hierarchical Structure in Financial Markets. European. Physical Journal. B11, pp. 193--197.
|
| |
30
|
Mori, T. amp; Uehara, K. (2001). Extraction of Primitive Motion and Discovery of Association Rules from Human Motion. In proceedings of the 10th IEEE Int'l Workshop on Robot and Human Communication, Bordeaux-Paris, France, Sept 18--21. pp 200--206.
|
 |
31
|
|
| |
32
|
|
| |
33
|
Radhakrishnan, N., Wilson, J. D. amp; Loizou, P. C. (2000). An Alternate Partitioning Technique to Quantify the Regularity of Complex Time Series. International Journal of Bifurcation and Chaos, Vol. 10, No. 7. World Scientific Publishing. pp 1773--1779.
|
| |
34
|
Reinert, G., Schbath, S. amp; Waterman, M. S. (2000). Probabilistic and statistical properties of words: An overview. J. Comput. Bio., Vol. 7, pp 1--46.
|
| |
35
|
|
| |
36
|
Sarker, B. K., Mori, T. amp; Uehara, K. (2002). Parallel Algorithms for Mining Association Rules in Time Series Data. CS24-2002-1 Tech report.
|
| |
37
|
Schittenkopf, C., Tino, P. amp; Dorffner, G. (2000). The Benefit of Information Reduction for Trading Strategies. Report Series for Adaptive Information Systems and Management in Economics and Management Science, July. Report #45.
|
| |
38
|
Steinback, M., Tan, P. N., Kumar, V., Klooster, S. amp; Potter, C. (2002). Temporal Data Mining for the Discovery and Analysis of Ocean Climate Indices. In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada. July 23.
|
| |
39
|
Timmermann, A., Sullivan, R. amp; White, H. (1998). The Dangers of Data-Driven Inference: The Case of Calendar Effects in Stock Returns. FMG Discussion Papers dp0304, Financial Markets Group and ESRC.
|
| |
40
|
Tino, P., Schittenkopf, C. amp; Dorffner, G. (2000). Temporal Pattern Recognition in Noisy Non-stationary Time Series Based on Quantization into Symbolic Streams: Lessons Learned from Financial Volatility Trading. Report Series for Adaptive Information Systems and Management in Economics and Management Science, July. Report #46.
|
| |
41
|
Truppel, Keogh, Lin (2003). A Hidden Constraint When Clustering Streaming Time Series. UCR Tech Report.
|
| |
42
|
|
| |
43
|
|
| |
44
|
Walker, J. (2001). HotBits: Genuine Random Numbers Generated by Radioactive Decay. <u>www.fourmilab.ch/hotbits/</u>
|
| |
45
|
Yairi, T., Kato, Y. amp; Hori, K. (2001). Fault Detection by Mining Association Rules in House-keeping Data. In proceedings of the 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space. Montreal, Canada, June 18--21.
|
|