ACM Home Page
Please provide us with feedback. Feedback
Locally adaptive dimensionality reduction for indexing large time series databases
Full text PdfPdf (300 KB)
Source International Conference on Management of Data archive
Proceedings of the 2001 ACM SIGMOD international conference on Management of data table of contents
Santa Barbara, California, United States
Pages: 151 - 162  
Year of Publication: 2001
ISBN:1-58113-332-4
Also published in ...
Authors
Eamonn Keogh  Department of Information and Computer Science, University of California, Irvine, California
Kaushik Chakrabarti  Department of Information and Computer Science, University of California, Irvine, California
Michael Pazzani  Department of Information and Computer Science, University of California, Irvine, California
Sharad Mehrotra  Department of Information and Computer Science, University of California, Irvine, California
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 103,   Citation Count: 96
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/375663.375680
What is a DOI?

ABSTRACT

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
Bay, S. D. (2000). The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science.
5
 
6
 
7
 
8
Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K & Mehrotra, S. (2000) Similar shape retrieval in MARS. Proceeding of IEEE International Conference on Multimedia and Expo.
 
9
 
10
11
 
12
Das, G., Lin, K. Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16-22.
 
13
Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179-183.
 
14
 
15
16
17
18
 
19
20
 
21
 
22
23
 
24
Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems.
25
 
26
Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 239-241, AAAI Press.
 
27
Keogh, E., & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 24-20.
28
 
29
30
31
 
32
Moody, G. (2000). MIT-BIH Database Distribution
 
33
 
34
 
35
Pavlidis, T. (1976). Waveform segmentation through functional approximation. IEEE Transcations on Computers, Vol C-22, NO. 7 July.
 
36
37
38
 
39
40
41
 
42
 
43
Shevchenko, M. (2000). {http://www.iki.rssi.ru/} Space Research Institute. Moscow, Russia.
 
44
 
45
 
46
 
47
Weigend, A. (1994). The Santa Fe Time Series Competition Data
 
48
Welch. D. & Quinn. P (1999). http://wwwmacho.mcmaster.ca/Project/Overview/status.html
49
50
 
51
 
52

CITED BY  96

Collaborative Colleagues:
Eamonn Keogh: colleagues
Kaushik Chakrabarti: colleagues
Michael Pazzani: colleagues
Sharad Mehrotra: colleagues