|
ABSTRACT
Users often do not require a complete answer to their query but rather only a sample. They expect the sample to be either the largest possible or the most representative (or both) given the resources available. We call the query processing techniques that deliver such results 'approximate'. Processing of queries to streams of data is said to be 'progressive' when it can continuously produce results as data arrives. In this paper, we are interested in the progressive and approximate processing of queries to data streams when processing is limited to main memory. In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms. We empirically evaluate the performance of our algorithms and compare them with algorithms based on existing techniques. In particular we study the trade-off between maximization of throughput and maximization of representativeness of the sample.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
3
|
|
| |
4
|
Don Carney , Uǧur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Greg Seidman , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Monitoring streams: a new class of data management applications, Proceedings of the 28th international conference on Very Large Data Bases, p.215-226, August 20-23, 2002, Hong Kong, China
|
 |
5
|
|
| |
6
|
W. G. Cochran. Sampling Techniques, 3rd Edition. John Wiley, 1977.
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
C. J. Hahn, S. G. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe, 1982--1991, http://cdiac.esd.ornl.gov/ftp/ndp026b, 1996.
|
| |
12
|
J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.
|
| |
13
|
|
| |
14
|
|
| |
15
|
F. Olken. Random Sampling from Databases. Ph.D. dissertation, Computer Science, University of California, 1993.
|
| |
16
|
|
 |
17
|
Yufei Tao , Man Lung Yiu , Dimitris Papadias , Marios Hadjieleftheriou , Nikos Mamoulis, RPJ: producing fast join results on streams through rate-based optimization, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066200]
|
| |
18
|
W. H. Tok, S. Bressan, and M.-L. Lee. RRPJ : Result-rate based progressive relational join. In DASFAA, pages 43--54, 2007.
|
| |
19
|
T. Urhan and M. J. Franklin. XJoin: Getting fast answers from slow and bursty networks. Technical Report CS-TR-3994, University of Maryland, 1999.
|
 |
20
|
|
|