|
ABSTRACT
Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors -- data size, windowed correlation, and fast response -- motivate this work.Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques -- sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design -- to achieve high performance windowed Pearson correlation over a variety of data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. Mallat and Z. Zhang, Matching Pursuit With Time-Frequency Dictionaries, IEEE Transactions on Signal Processing, 1993.
|
| |
3
|
|
| |
4
|
R. Cole, D. Shasha and X. J. Zhao, Fast Window Correlations Over Uncooperative Time Series, Department of Computer Science, New York University, New York, NY, Technical Report, 2005.
|
| |
5
|
C. Alexander, Market Models: A Guide to Financial Data Analysis, John Wiley & Sons, 2001.
|
| |
6
|
Wharton Research Data Services(WRDS), http://wrds.wharton.upenn.edu/
|
| |
7
|
E. Keogh and T. Folias, The UCR Time Series Data Mining Archive. Riverside CA. University of California - Computer Science & Engineering Department, http://www.cs.ucr.edu/~eamonn/TSDMA/index.html, 2002.
|
| |
8
|
B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC, 1994.
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
Eyal Kushilevitz , Rafail Ostrovsky , Yuval Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.614-623, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276877]
|
| |
14
|
Y. Zhu and D. Shasha, StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB, Hong Kong, China, August, 2002.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
G. Cormode, P. Indyk, N. Koudas and S. Muthukrishnan, Fast mining of massive tabular data via approximate distance computations, ICDE, 2002.
|
| |
19
|
W. Johnson and J. Lindenstrauss, Extensions of Lipschitz mapping into hilbert space, Contemporary Mathematics, 26, 189--206, 1984.
|
 |
20
|
|
| |
21
|
E. Keogh, Exact indexing of dynamic time warping, VLDB, 2002.
|
| |
22
|
|
| |
23
|
|
 |
24
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
 |
25
|
Flip Korn , H. V. Jagadish , Christos Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.289-300, May 11-15, 1997, Tucson, Arizona, United States
|
 |
26
|
Yi-Leh Wu , Divyakant Agrawal , Amr El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases, Proceedings of the ninth international conference on Information and knowledge management, p.488-495, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354857]
|
| |
27
|
I. Popivanov and R. Miller, Similarity Search Over Time Series Data Using Wavelets, ICDE, 2002.
|
| |
28
|
K. Chan and A. W. Fu, Efficient Time Series Matching by Wavelets, ICDE, 1999.
|
 |
29
|
|
| |
30
|
|
| |
31
|
E. Drinea, P. Drineas and P. Huggins, A Randomized Singular Value Decomposition Algorithm for Image Processing, Panhellenic Conference on Informatics (PCI), 2001.
|
 |
32
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
33
|
|
| |
34
|
|
 |
35
|
Christos Faloutsos , M. Ranganathan , Yannis Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.419-429, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
36
|
E. Keogh, K. Chakrabarti, M. Pazzani and S. Mehrotra, Dimensionality Reduction for fast similarity search in large time series databases, Knowledge and Information Systems, 3, 263--286, 2000.
|
|