| Time-decaying aggregates in out-of-order streams |
| Full text |
Pdf
(300 KB)
|
Source
|
Symposium on Principles of Database Systems
archive
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
table of contents
Vancouver, Canada
SESSION: Streaming data & best paper award
table of contents
Pages: 89-98
Year of Publication: 2008
ISBN:978-1-60558-152-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 92, Citation Count: 1
|
|
|
ABSTRACT
Processing large data streams is now a major topic in data management. The data involved can be truly massive, and the required analyses complex. In a stream of sequential events such as stock feeds, sensor readings, or IP traffic measurements, data tuples pertaining to recent events are typically more important than older ones. This can be formalized via time-decay functions, which assign weights to data based on the age of data. Decay functions such as sliding windows and exponential decay have been studied under the assumption of well-ordered arrivals, i.e., data arrives in non-decreasing order of time stamps. However, data quality issues are prevalent in massive streams (due to network asynchrony and delays etc.), and correct arrival order is not guaranteed. We focus on the computation of decayed aggregates such as range queries, quantiles, and heavy hitters on out-of-order streams, where elements do not necessarily arrive in increasing order of timestamps. Existing techniques such as Exponential Histograms and Waves are unable to handle out-of-order streams. We give the first deterministic algorithms for approximating these aggregates under popular decay functions such as sliding window and polynomial decay. We study the overhead of allowing out-of-order arrivals when compared to well-ordered arrivals, both analytically and experimentally. Our experiments confirm that these algorithms can be applied in practice, and compare the relative performance of different approaches for handling out-of-order arrivals.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
D. Abadi , D. Carney , U. Çetintemel , M. Cherniack , C. Convey , C. Erwin , E. Galvez , M. Hatoun , A. Maskey , A. Rasin , A. Singer , M. Stonebraker , N. Tatbul , Y. Xing , R. Yan , S. Zdonik, Aurora: a data stream management system, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
[doi> 10.1145/872757.872855]
|
| |
2
|
|
 |
3
|
|
 |
4
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
5
|
Brain Babcock , Mayur Datar , Rajeev Motwani , Liadan O'Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.234-243, June 09-11, 2003, San Diego, California
[doi> 10.1145/773153.773176]
|
| |
6
|
|
| |
7
|
C. Busch and S. Tirthapura. A deterministic algorithm for summarizing asynchronous streams over a sliding window. In STACS, 2007.
|
 |
8
|
|
 |
9
|
|
 |
10
|
Graham Cormode , Theodore Johnson , Flip Korn , S. Muthukrishnan , Oliver Spatscheck , Divesh Srivastava, Holistic UDAFs at streaming speeds, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007575]
|
 |
11
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Space- and time-efficient deterministic algorithms for biased quantiles over data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142389]
|
| |
12
|
G. Cormode, F. Korn, and S. Tirthapura. Exponentially Decayed Aggregates on Data Streams. In ICDE, 2008.
|
| |
13
|
|
 |
14
|
|
| |
15
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
16
|
P. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. Theory of Computing Systems, 37:457--478, 2004.
|
| |
17
|
J. Hershberger, N. Shrivastava, S. Suri, and C. Toth. Adaptive spatial partitioning for multidimensional data streams. In ISAAC, 2004.
|
| |
18
|
T. Kopelowitz and E. Porat. Improved Algorithms for Polynomial Time-Decay and Time-Decay with Additive error. In ICTCS, 2005.
|
 |
19
|
|
| |
20
|
|
| |
21
|
J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143--152, 1982.
|
| |
22
|
|
| |
23
|
J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theor. Comput. Sci., 12:315--323, 1980.
|
| |
24
|
|
 |
25
|
Nisheeth Shrivastava , Chiranjeeb Buragohain , Divyakant Agrawal , Subhash Suri, Medians and beyond: new aggregation techniques for sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
[doi> 10.1145/1031495.1031524]
|
 |
26
|
|
| |
27
|
|
|