ACM Home Page
Please provide us with feedback. Feedback
On the need for time series data mining benchmarks: a survey and empirical demonstration
Full text PdfPdf (1.01 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Streams and time series table of contents
Pages: 102 - 111  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Eamonn Keogh  University of California - Riverside, Riverside, CA
Shruti Kasetty  University of California - Riverside, Riverside, CA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 32,   Downloads (12 Months): 285,   Citation Count: 48
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775062
What is a DOI?

ABSTRACT

In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.To illustrate our point, we have undertaken the most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets. Our empirical results strongly support our assertion, and suggest the need for a set of time series benchmarks and more careful empirical evaluation in the data mining community.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
Bailey, D. (1991). Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputing Review, Aug. 1991, pp. 54--55.
 
6
Bay, S. (1999). UCI Repository of Kdd databases {http://kdd.ics.uci.edu/}. Irvine, CA: University of California, Department of Information and Computer Science
 
7
8
9
 
10
11
 
12
Cohen, W. (1993). Efficient pruning methods for separate-and-conquer rule learning systems. In proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France. pp 88--994.
 
13
 
14
Das, G., Lin, K., Mannila, H., Renganathan, G. & Smyth, P. (1998). Rule discovery from time series. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 16--22.
 
15
Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of kohonen maps applied to curves. In proceedings of the 4th Int'l Conference of Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 179--183.
 
16
17
 
18
19
20
 
21
 
22
23
24
 
25
Huhtala, Y., Kärkkäinen, J. & Toivonen, H. (1999). Mining for similarities in aligned time series using wavelets. Data Mining and Knowledge Discovery: Theory, Tools, and Technology, SPIE Proceedings Series, Vol. 3695. Orlando, FL, Apr. pp 150--160.
 
26
 
27
 
28
 
29
 
30
Keogh, E. & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering artd relevance feedback. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 239--241.
 
31
Keogh, E. & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. In proceedings of the 3rd Int'l Conference on Knowledge Discovery and Data Mining. Newport Beach, CA, Aug 14--17. pp 24--20.
32
 
33
Kibler, D., & Langley, P. (1988). Machine learning as an experimental science. In Proceedings of the 3rd European Working Session on Learning. pp. 81--92
 
34
35
 
36
 
37
Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D. & Allan, J. (2000). Mining of concurrent text and time series. In proceedings of the 6th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining. Boston, MA, Aug 20--23. pp 37--44.
 
38
39
40
 
41
42
 
43
44
 
45
 
46
Pratt, K. B. & Fink, E. (2002). Search for patterns in compressed time series. Int'l Journal of Image and Graphics. to appear.
 
47
Prechelt. L. (1995). A quantitative study of neural network learning algorithm evaluation practices. In proceedings of the 4th Int'l Conference on Artificial Neural Networks. pp. 223--227.
48
 
49
Rafiei, D. & Mendelzon, A. O. (1998). Efficient retrieval of similar time sequences using DFT. In proceedings of the 5th Int'l Conference on Foundations of Data Organization and Algorithms. Kobe, Japan, Nov 12--13.
 
50
 
51
 
52
 
53
Simon, J. L. (1994). What some puzzling problems teach about the theory of simulation and the use of resampling. The American Statistician, Vol. 48(4). Nov. pp 1--4.
 
54
 
55
Walker, J. (2001). HotBits: Genuine random numbers generated by radioactive decay. www.fourrnilab.ch/hotbits/
 
56
Wang, C. & Wang, X. S. (2000). Multilevel filtering for high dimensional nearest neighbor search. In proceedings of ACM SIGMOD Workshop on Research lssues in Data Mining and Knowledge Discovery. Dallas, TX, May 14. pp 37--43.
 
57
58
 
59
60
 
61
 
62

CITED BY  49

Collaborative Colleagues:
Eamonn Keogh: colleagues
Shruti Kasetty: colleagues