|
ABSTRACT
A Bottom-k sketch is a summary of a set of items with nonnegative weights. Each such summary allows us to compute approximate aggregates over the set of items. Bottom-k sketches are obtained by associating with each item in a ground set an independent random rank drawn from a probability distribution that depends on the weight of the item. For each subset of interest, the bottom-k sketch is the set of the k minimum ranked items and their ranks. Bottom-k sketches have numerous applications. We develop and analyze data structures and estimators for bottom-k sketches to facilitate their deployment. We develop novel estimators and algorithms that show that they are a superior alternative to other sketching methods in both efficiency of obtaining the sketches and the accuracy of the estimates derived from the sketches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Sketching unaggregated data streams for subpopulation-size queries, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
[doi> 10.1145/1265530.1265566]
|
| |
8
|
|
| |
9
|
E. Cohen and H. Kaplan. Sketches and estimators for subpopulation weight queries. Manuscript, 2007.
|
| |
10
|
|
| |
11
|
E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. Manuscript, 2007.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman , Cheng Yang, Finding Interesting Associations without Support Pruning, IEEE Transactions on Knowledge and Data Engineering, v.13 n.1, p.64-78, January 2001
[doi> 10.1109/69.908981]
|
 |
16
|
Neil T. Spring , David Wetherall, A protocol-independent technique for eliminating redundant network traffic, Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, p.87-95, August 28-September 01, 2000, Stockholm, Sweden
|
CITED BY 6
|
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Sketching unaggregated data streams for subpopulation-size queries, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
|
|
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Algorithms and estimators for accurate summarization of internet traffic, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Stream sampling for variance-optimal estimation of subset sums, Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms, p.1255-1264, January 04-06, 2009, New York, New York
|
|