|
ABSTRACT
Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear time construction of 1 + &egr; approximation of optimal histograms, running in polylogarithmic space.Our results extend to the context of data-streams, and in fact generalize to give 1 + &egr; approximation of several problems in data-streams which require partitioning the index set into intervals. The only assumptions required are that the cost of an interval is monotonic under inclusion (larger interval has larger cost) and that the cost can be computed or approximated in small space. This exhibits a nice class of problems for which we can have near optimal data-stream algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
2
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
|
 |
4
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Random sampling for histogram construction: how much is enough?, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.436-447, June 01-04, 1998, Seattle, Washington, United States
|
| |
5
|
|
| |
6
|
P. Flajolet and G. N. Martin. Probabilistic counting. Proceedings of 24th Annual IEEE Symposium on Foundations of Computer Science, pages 76-82, 1983.
|
| |
7
|
|
| |
8
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
9
|
M. R. Henzinger, P .Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center, May, 1998.
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
16
|
|
 |
17
|
Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Optimal histograms for hierarchical range queries (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.196-204, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335223]
|
 |
18
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
19
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
20
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
21
|
J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315-323, 1980.
|
| |
22
|
M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multidimension al queries. Proceedings of ACM SIGMOD, 1998.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
 |
27
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
28
|
|
 |
29
|
Jeffrey Scott Vitter , Min Wang , Bala Iyer, Data cube approximation and histograms via wavelets, Proceedings of the seventh international conference on Information and knowledge management, p.96-104, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288645]
|
CITED BY 65
|
|
|
|
|
|
|
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
|
|
|
|
|
Ziv Bar-Yosseff , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Miklós Ajtai , T. S. Jayram , Ravi Kumar , D. Sivakumar, Approximate counting of inversions in a data stream, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Graham Cormode , Theodore Johnson , Flip Korn , S. Muthukrishnan , Oliver Spatscheck , Divesh Srivastava, Holistic UDAFs at streaming speeds, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joan Feigenbaum , Sampath Kannan , Andrew McGregor , Siddharth Suri , Jian Zhang, Graph distances in the streaming model: the value of space, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
|
|
|
|
|
|
|
|
Amit Deshpande , Luis Rademacher , Santosh Vempala , Grant Wang, Matrix approximation and projective clustering via volume sampling, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.1117-1126, January 22-26, 2006, Miami, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Jon McAlister , Jennifer Widom, Characterizing memory requirements for queries over continuous data streams, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Luciana S. Buriol , Gereon Frahling , Stefano Leonardi , Alberto Marchetti-Spaccamela , Christian Sohler, Counting triangles in data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Finding hierarchical heavy hitters in data streams, Proceedings of the 29th international conference on Very large data bases, p.464-475, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
Graham Cormode , Mayur Datar , Piotr Indyk , S. Muthukrishnan, Comparing data streams using Hamming norms (how to zero in), Proceedings of the 28th international conference on Very Large Data Bases, p.335-345, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Zhou , Yi Han , Jian Pei , Bin Jiang , Yufei Tao , Yan Jia, Continuous privacy preserving publishing of data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|