ACM Home Page
Please provide us with feedback. Feedback
BRAID: stream mining through group lag correlations
Full text PdfPdf (1.49 MB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: stream and sequence mining table of contents
Pages: 599 - 610  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Yasushi Sakurai  NTT Cyber Space Laboratories
Spiros Papadimitriou  Carnegie Mellon University
Christos Faloutsos  Carnegie Mellon University
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 71,   Citation Count: 7
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066226
What is a DOI?

ABSTRACT

The goal is to monitor multiple numerical streams, and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations (and anti-correlations) are frequent, and very interesting in practice: For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of fluoride in the drinking water may lead to fewer dental cavities, some years later. Additional settings include network analysis, sensor monitoring, financial data analysis, and moving object tracking. Such data streams are often correlated (or anti-correlated), but with an unknown lag.We propose BRAID, a method to detect lag correlations between data streams. BRAID can handle data streams of semi-infinite length, incrementally, quickly, and with small resource consumption. We also provide a theoretical analysis, which, based on Nyquist's sampling theorem, shows that BRAID can estimate lag correlations with little, and often with no error at all. Our experiments on real and realistic data show that BRAID detects the correct lag perfectly most of the time (the largest relative error was about 1%); while it is up to 40,000 times faster than the naive implementation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
R. P. Brent. Algorithm for Minimization without Derivatives. Dover Publications, 2002.
 
6
D. Carney, U. Cetintemel, A. Rasin, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. VLDB, pages 838--849, Sept. 2003.
 
7
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. CIDR, Jan. 2003.
 
8
S. Chandrasekaran and M. J. Franklin. Remembrance of streams past: Overload-sensitive management of archived streams. VLDB, pages 348--359, August-September 2004.
9
10
11
12
13
 
14
15
16
17
 
18
 
19
S. Guha, C. Kim, and K. Shim. Xwave: Approximate extended wavelets for streaming data. VLDB, pages 288--299, August-September 2004.
 
20
 
21
22
 
23
E. J. Keogh. Exact indexing of dynamic time warping. VLDB, pages 406--417, Aug. 2002.
24
 
25
K. Koper, T. Wallace, S. Taylor, and H. Hartse. Forensic seismology and the sinking of the kursk. EOS Trans., AGU, 82, pages 37,45--46, 2001.
 
26
N. Koudas, B. C. Ooi, K.-L. Tan, and R. Zhang. Approximate nn queries on streams with guaranteed error/performance bounds. VLDB, pages 804--815, August-September 2004.
 
27
B. P. Lathi. Signal Processing and Linear Systems. Oxford University Press, 1998.
28
 
29
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. S. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system. CIDR, Jan. 2003.
 
30
S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. VLDB, pages 560--571, Sept. 2003.
 
31
N. Tatbul, U. Cetintemel, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Load shedding in a data stream manager. VLDB, pages 309--320, Sept. 2003.
 
32
M. Wang, T. Madhyastha, N. H. Chang, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. ICDE, Feb. 2002.
 
33
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE, pages 13--22, 2000.
 
34
Y. Zhu and D. Shasha. Statistical monitoring of thousands of data streams in real time. VLDB, pages 358--369, Aug. 2002.
35

CITED BY  7
Collaborative Colleagues:
Yasushi Sakurai: colleagues
Spiros Papadimitriou: colleagues
Christos Faloutsos: colleagues