|
ABSTRACT
(MATH) A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ||A—H||&xie;(1+&egr;)||A—Hopt||, where the error ||A—H|| is either $\ell_1$ (absolute) or $\ell_2$ (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B,log(N),log||A,1/&egr;. Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accruacy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under $\ell_2$ error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss. QuickSAND: Quick Summary and Analysis of Network Data DIMACS Technical Report 2001-43.
|
| |
8
|
S. Guha, N. Koudas. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. ICDE 2002.
|
 |
9
|
|
| |
10
|
|
| |
11
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
 |
12
|
Ju-Hong Lee , Deok-Hwan Kim , Chin-Wan Chung, Multi-dimensional selectivity estimation using compressed histogram information, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.205-214, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
13
|
|
| |
14
|
M. Naor, O. Reingold. Private communication, March, 1999.
|
 |
15
|
|
| |
16
|
V. Poosala. Histograms for selecitivty estimation. PhD Thesis, U. Wisconsin, Madison. 1997.
|
 |
17
|
|
CITED BY 55
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Matthew Roughan , Subhabrata Sen , Oliver Spatscheck , Nick Duffield, Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. R. Calderbank , A. Gilbert , K. Levchenko , S. Muthukrishnan , M. Strauss, Improved range-summable random variable construction algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joan Feigenbaum , Sampath Kannan , Andrew McGregor , Siddharth Suri , Jian Zhang, Graph distances in the streaming model: the value of space, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. C. Gilbert , S. Guha , P. Indyk , S. Muthukrishnan , M. Strauss, Near-optimal sparse fourier representations via sampling, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
Mark Daly , Yitzhak Mandelbaum , David Walker , Mary Fernández , Kathleen Fisher , Robert Gruber , Xuan Zheng, PADS: an end-to-end system for processing ad hoc data, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
Wei Wang , Haifeng Jiang , Hongjun Lu , Jeffrey Xu Yu, Bloom histogram: path selectivity estimation for XML data with updates, Proceedings of the Thirtieth international conference on Very large data bases, p.240-251, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, How to summarize the universe: dynamic maintenance of quantiles, Proceedings of the 28th international conference on Very Large Data Bases, p.454-465, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
Bo Sheng , Chiu Chiang Tan , Qun Li , Weizhen Mao, Finding popular categories for RFID tags, Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing, May 26-30, 2008, Hong Kong, Hong Kong, China
|
|
|
Robert Schweller , Zhichun Li , Yan Chen , Yan Gao , Ashish Gupta , Yin Zhang , Peter A. Dinda , Ming-Yang Kao , Gokhan Memik, Reversible sketches: enabling monitoring and analysis over high-speed data streams, IEEE/ACM Transactions on Networking (TON), v.15 n.5, p.1059-1072, October 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|