| Space-efficient online computation of quantile summaries |
| Full text |
Pdf
(233 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2001 ACM SIGMOD international conference on Management of data
table of contents
Santa Barbara, California, United States
Pages: 58 - 66
Year of Publication: 2001
ISBN:1-58113-332-4
Also published in ...
|
|
Authors
|
|
Michael Greenwald
|
Computer & Information Science Department, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA
|
|
Sanjeev Khanna
|
Computer & Information Science Department, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 111, Citation Count: 87
|
|
|
ABSTRACT
An ∈-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of ∈N.
We present a new online algorithm for computing∈-approximate quantile summaries of very large data sequences. The algorithm has a worst-case space requirement of &Ogr;(1÷∈ log(∈N)). This improves upon the previous best result of &Ogr;(1÷∈ log2(∈N)). Moreover, in contrast to earlier deterministic algorithms, our algorithm does not require a priori knowledge of the length of the input sequence.
Finally, the actual space bounds obtained on experimental data are significantly better than the worst case guarantees of our algorithm as well as the observed space requirements of earlier algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal and Arun Swami. A one-pass space-efficient algorithm for finding quantiles. Proc. 7th Int. Conf. Management of Data, COMAD, 28-30 December 1995.
|
| |
2
|
|
 |
3
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Random sampling for histogram construction: how much is enough?, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.436-447, June 01-04, 1998, Seattle, Washington, United States
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
I. Pohl. A minimum storage algorithm for computing the median. IBM Research Report RC 2701, November 1969.
|
 |
8
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
9
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
10
|
J. I. Munro and M.S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, vol. 12: 315-323; 1980.
|
| |
11
|
M.S. Paterson. Progress in selection. Technical Report, University ofWarwick, Coventry, UK, 1997.
|
| |
12
|
Viswanath Poosala, Venkatesh Ganti, and Yannis E. Ioannidis. Approximate query answering using histograms. Bulletin of the IEEE Technical Committee on Data Engineering, 22(4):6-15, December 1999.
|
 |
13
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 87
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xuemin Lin , Jian Xu , Qing Zhang , Hongjun Lu , Jeffrey Xu Yu , Xiaofang Zhou , Yidong Yuan, Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams, IEEE Transactions on Knowledge and Data Engineering, v.18 n.5, p.683-698, May 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amitabha Bagchi , Amitabh Chaudhary , David Eppstein , Michael T. Goodrich, Deterministic sampling and range counting in geometric data streams, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
|
|
|
|
|
|
Graham Cormode , Theodore Johnson , Flip Korn , S. Muthukrishnan , Oliver Spatscheck , Divesh Srivastava, Holistic UDAFs at streaming speeds, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
Nisheeth Shrivastava , Chiranjeeb Buragohain , Divyakant Agrawal , Subhash Suri, Medians and beyond: new aggregation techniques for sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
|
|
|
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rohit Ananthakrishna , Abhinandan Das , Johannes Gehrke , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Efficient Approximation of Correlated Sums on Data Streams, IEEE Transactions on Knowledge and Data Engineering, v.15 n.3, p.569-572, March 2003
|
|
|
|
|
|
|
|
|
|
|
|
Jiawei Han , Yixin Chen , Guozhu Dong , Jian Pei , Benjamin W. Wah , Jianyong Wang , Y. Dora Cai, Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams, Distributed and Parallel Databases, v.18 n.2, p.173-197, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Jian Pei , Philip S. Yu , Jeffrey Xu Yu, Suppressing model overfitting in mining concept-drifting data streams, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Space- and time-efficient deterministic algorithms for biased quantiles over data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhang Longbo , Li Zhanhuai , Zhao Yiqiang , Yu Min , Zhang Yang, A priority random sampling algorithm for time-based sliding windows over weighted streaming data, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Finding hierarchical heavy hitters in data streams, Proceedings of the 29th international conference on Very large data bases, p.464-475, September 09-12, 2003, Berlin, Germany
|
|
|
Yixin Chen , Guozhu Dong , Jiawei Han , Benjamin W. Wah , Jianyong Wang, Multi-dimensional regression analysis of time-series data streams, Proceedings of the 28th international conference on Very Large Data Bases, p.323-334, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, How to summarize the universe: dynamic maintenance of quantiles, Proceedings of the 28th international conference on Very Large Data Bases, p.454-465, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yixin Chen , Guozhu Dong , Jiawei Han , Jian Pei , Benjamin W. Wah , Jianyong Wang, Regression Cubes with Lossless Compression and Aggregation, IEEE Transactions on Knowledge and Data Engineering, v.18 n.12, p.1585-1599, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ke Yi , Feifei Li , Graham Cormode , Marios Hadjieleftheriou , George Kollios , Divesh Srivastava, Small synopses for group-by query verification on outsourced data streams, ACM Transactions on Database Systems (TODS), v.34 n.3, p.1-42, August 2009
|
|
|
|
|