|
ABSTRACT
In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.To illustrate our point, we have undertaken the most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets. Our empirical results strongly support our assertion, and suggest the need for a set of time series benchmarks and more careful empirical evaluation in the data mining community.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
| |
3
|
|
| |
4
|
|
| |
5
|
Bailey, D. (1991). Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputing Review, Aug. 1991, pp. 54--55.
|
| |
6
|
Bay, S. (1999). UCI Repository of Kdd databases {http://kdd.ics.uci.edu/}. Irvine, CA: University of California, Department of Information and Computer Science
|
| |
7
|
|
 |
8
|
Tolga Bozkaya , Nasser Yazdani , Meral Özsoyoğlu, Matching and indexing sequences of different lengths, Proceedings of the sixth international conference on Information and knowledge management, p.128-135, November 10-14, 1997, Las Vegas, Nevada, United States
[doi> 10.1145/266714.266880]
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
Cohen, W. (1993). Efficient pruning methods for separate-and-conquer rule learning systems. In proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France. pp 88--994.
|
| |
13
|
|
| |
14
|
Das, G., Lin, K., Mannila, H., Renganathan, G. & Smyth, P. (1998). Rule discovery from time series. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 16--22.
|
| |
15
|
Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of kohonen maps applied to curves. In proceedings of the 4th Int'l Conference of Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 179--183.
|
| |
16
|
|
 |
17
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
18
|
|
 |
19
|
Martin Gavrilov , Dragomir Anguelov , Piotr Indyk , Rajeev Motwani, Mining the stock market (extended abstract): which measure is best?, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.487-496, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347189]
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
Huhtala, Y., Kärkkäinen, J. & Toivonen, H. (1999). Mining for similarities in aligned time series using wavelets. Data Mining and Knowledge Discovery: Theory, Tools, and Technology, SPIE Proceedings Series, Vol. 3695. Orlando, FL, Apr. pp 150--160.
|
| |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
Keogh, E. & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering artd relevance feedback. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 239--241.
|
| |
31
|
Keogh, E. & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. In proceedings of the 3rd Int'l Conference on Knowledge Discovery and Data Mining. Newport Beach, CA, Aug 14--17. pp 24--20.
|
 |
32
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
| |
33
|
Kibler, D., & Langley, P. (1988). Machine learning as an experimental science. In Proceedings of the 3rd European Working Session on Learning. pp. 81--92
|
| |
34
|
|
 |
35
|
Flip Korn , H. V. Jagadish , Christos Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.289-300, May 11-15, 1997, Tucson, Arizona, United States
|
| |
36
|
|
| |
37
|
Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D. & Allan, J. (2000). Mining of concurrent text and time series. In proceedings of the 6th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining. Boston, MA, Aug 20--23. pp 37--44.
|
| |
38
|
|
 |
39
|
Chung-Sheng Li , Philip S. Yu , Vittorio Castelli, MALM: a framework for mining sequence database at multiple abstraction levels, Proceedings of the seventh international conference on Information and knowledge management, p.267-272, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288666]
|
 |
40
|
Woong-Kee Loh , Sang-Wook Kim , Kyu-Young Whang, Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.480-487, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354856]
|
| |
41
|
|
 |
42
|
|
| |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
Pratt, K. B. & Fink, E. (2002). Search for patterns in compressed time series. Int'l Journal of Image and Graphics. to appear.
|
| |
47
|
Prechelt. L. (1995). A quantitative study of neural network learning algorithm evaluation practices. In proceedings of the 4th Int'l Conference on Artificial Neural Networks. pp. 223--227.
|
 |
48
|
Yunyao Qu , Changzhou Wang , X. Sean Wang, Supporting fast search in time series for movement patterns in multiple scales, Proceedings of the seventh international conference on Information and knowledge management, p.251-258, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288664]
|
| |
49
|
Rafiei, D. & Mendelzon, A. O. (1998). Efficient retrieval of similar time sequences using DFT. In proceedings of the 5th Int'l Conference on Foundations of Data Organization and Algorithms. Kobe, Japan, Nov 12--13.
|
| |
50
|
|
| |
51
|
|
| |
52
|
|
| |
53
|
Simon, J. L. (1994). What some puzzling problems teach about the theory of simulation and the use of resampling. The American Statistician, Vol. 48(4). Nov. pp 1--4.
|
| |
54
|
|
| |
55
|
Walker, J. (2001). HotBits: Genuine random numbers generated by radioactive decay. www.fourrnilab.ch/hotbits/
|
| |
56
|
Wang, C. & Wang, X. S. (2000). Multilevel filtering for high dimensional nearest neighbor search. In proceedings of ACM SIGMOD Workshop on Research lssues in Data Mining and Knowledge Discovery. Dallas, TX, May 14. pp 37--43.
|
| |
57
|
|
 |
58
|
|
| |
59
|
|
 |
60
|
Yi-Leh Wu , Divyakant Agrawal , Amr El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.488-495, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354857]
|
| |
61
|
|
| |
62
|
|
CITED BY 49
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Jeffrey P. Lankford , Donna M. Nystrom, Visually mining and monitoring massive time series, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Li Wei , Xiaopeng Xi , Sang-Hee Lee , Michail Vlachos, LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Themistoklis Palpanas , Victor B. Zordan , Dimitrios Gunopulos , Marc Cardle, Indexing large human-motion databases, Proceedings of the Thirtieth international conference on Very large data bases, p.780-791, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Stefano Lonardi , Chotirat Ann Ratanamahatana , Li Wei , Sang-Hee Lee , John Handley, Compression-based data mining of sequential data, Data Mining and Knowledge Discovery, v.14 n.1, p.99-129, February 2007
|
|
|
|
|
|
|
|
|
Jonathan S. Anstey , Dennis K. Peters , Chris Dawson, An improved feature extraction technique for high volume time series data, Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications, p.74-81, February 14-16, 2007, Innsbruck, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Li Wei , Xiaopeng Xi , Michail Vlachos , Sang-Hee Lee , Pavlos Protopapas, Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures, The VLDB Journal — The International Journal on Very Large Data Bases, v.18 n.3, p.611-630, June 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|