|
ABSTRACT
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Rakesh Agrawal , King-Ip Lin , Harpreet S. Sawhney , Kyuseok Shim, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.490-501, September 11-15, 1995
|
| |
4
|
Bay, S. D. (2000). The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science.
|
 |
5
|
Kristin P. Bennett , Usama Fayyad , Dan Geiger, Density-based indexing for approximate nearest-neighbor queries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.233-243, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312236]
|
| |
6
|
|
| |
7
|
|
| |
8
|
Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K & Mehrotra, S. (2000) Similar shape retrieval in MARS. Proceeding of IEEE International Conference on Multimedia and Expo.
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
Das, G., Lin, K. Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16-22.
|
| |
13
|
Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179-183.
|
| |
14
|
|
| |
15
|
|
 |
16
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
17
|
|
 |
18
|
Joseph M. Hellerstein , Elias Koutsoupias , Christos H. Papadimitriou, On the analysis of indexing schemes, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.249-256, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263688]
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
K. V. Ravi Kanth , Divyakant Agrawal , Ambuj Singh, Dimensionality reduction for similarity searching in dynamic databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.166-176, June 01-04, 1998, Seattle, Washington, United States
|
| |
24
|
Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems.
|
 |
25
|
|
| |
26
|
Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 239-241, AAAI Press.
|
| |
27
|
Keogh, E., & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 24-20.
|
 |
28
|
Flip Korn , H. V. Jagadish , Christos Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.289-300, May 11-15, 1997, Tucson, Arizona, United States
|
| |
29
|
|
 |
30
|
Chung-Sheng Li , Philip S. Yu , Vittorio Castelli, MALM: a framework for mining sequence database at multiple abstraction levels, Proceedings of the seventh international conference on Information and knowledge management, p.267-272, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288666]
|
 |
31
|
Woong-Kee Loh , Sang-Wook Kim , Kyu-Young Whang, Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.480-487, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354856]
|
| |
32
|
Moody, G. (2000). MIT-BIH Database Distribution
|
| |
33
|
|
| |
34
|
|
| |
35
|
Pavlidis, T. (1976). Waveform segmentation through functional approximation. IEEE Transcations on Computers, Vol C-22, NO. 7 July.
|
| |
36
|
|
 |
37
|
|
 |
38
|
Yunyao Qu , Changzhou Wang , X. Sean Wang, Supporting fast search in time series for movement patterns in multiple scales, Proceedings of the seventh international conference on Information and knowledge management, p.251-258, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288664]
|
| |
39
|
|
 |
40
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
 |
41
|
|
| |
42
|
|
| |
43
|
Shevchenko, M. (2000). {http://www.iki.rssi.ru/} Space Research Institute. Moscow, Russia.
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
Weigend, A. (1994). The Santa Fe Time Series Competition Data
|
| |
48
|
Welch. D. & Quinn. P (1999). http://wwwmacho.mcmaster.ca/Project/Overview/status.html
|
 |
49
|
Yi-Leh Wu , Divyakant Agrawal , Amr El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.488-495, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354857]
|
 |
50
|
Daniel Wu , Ambuj Singh , Divyakant Agrawal , Amr El Abbadi , Terence R. Smith, Efficient retrieval for browsing large image databases, Proceedings of the fifth international conference on Information and knowledge management, p.11-18, November 12-16, 1996, Rockville, Maryland, United States
[doi> 10.1145/238355.238365]
|
| |
51
|
|
| |
52
|
|
CITED BY 96
|
|
|
|
|
Hiromitsu Shimakawa , Hiroyuki Yamahara , Yusuke Imayama , Masato Ushijima , Shinsuke Azuma, Pattern refinement with model data fusion to predict exchange rate movement, Design and application of hybrid intelligent systems, IOS Press, Amsterdam, The Netherlands, 2003
|
|
|
|
|
|
Michail Vlachos , Carlotta Domeniconi , Dimitrios Gunopulos , George Kollios , Nick Koudas, Non-linear dimensionality reduction techniques for classification and visualization, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Bill Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13-13, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jessica Lin , Eamonn Keogh , Stefano Lonardi , Jeffrey P. Lankford , Donna M. Nystrom, Visually mining and monitoring massive time series, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huanmei Wu , Betty Salzberg , Gregory C Sharp , Steve B Jiang , Hiroki Shirato , David Kaeli, Subsequence matching on structured time series data, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
Michail Vlachos , Christopher Meek , Zografoula Vagena , Dimitrios Gunopulos, Identifying similarities, periodicities and bursts for online search queries, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
Gautam Das , Vagelis Hristidis , Nishant Kapoor , S. Sudarshan, Ordering the attributes of query results, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aris Anagnostopoulos , Michail Vlachos , Marios Hadjieleftheriou , Eamonn Keogh , Philip S. Yu, Global distance-based segmentation of trajectories, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eamonn Keogh , Themistoklis Palpanas , Victor B. Zordan , Dimitrios Gunopulos , Marc Cardle, Indexing large human-motion databases, Proceedings of the Thirtieth international conference on Very large data bases, p.780-791, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elaine P. Sousa , Caetano Traina, Jr. , Agma J. Traina , Leejay Wu , Christos Faloutsos, A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery, v.14 n.3, p.367-407, June 2007
|
|
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
|
|
|
Khanh Vu , Kien A. Hua , Hao Cheng , Sheau-Dong Lang, A non-linear dimensionality-reduction technique for fast similarity search in large databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jonathan S. Anstey , Dennis K. Peters , Chris Dawson, An improved feature extraction technique for high volume time series data, Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications, p.74-81, February 14-16, 2007, Innsbruck, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiuxia Chen , Lei Chen , Xiang Lian , Yunhao Liu , Jeffrey Xu Yu, Indexable PLA for efficient similarity search, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tiancheng Zhang , Dejun Yue , Yu Gu , Yi Wang , Ge Yu, Adaptive correlation analysis in stream time series with sliding windows, Computers & Mathematics with Applications, v.57 n.6, p.937-948, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. Aditya Prakash , Nicholas Valler , David Andersen , Michalis Faloutsos , Christos Faloutsos, BGP-lens: patterns and anomalies in internet routing updates, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|
|
|
|
|
Sorabh Gandhi , Suman Nath , Subhash Suri , Jie Liu, GAMPS: compressing multi sensor data by grouping and amplitude scaling, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|