| Sampling algorithms: lower bounds and applications |
| Full text |
Pdf
(276 KB)
|
| Source
|
Annual ACM Symposium on Theory of Computing
archive
Proceedings of the thirty-third annual ACM symposium on Theory of computing
table of contents
Hersonissos, Greece
Pages: 266 - 275
Year of Publication: 2001
ISBN:1-58113-349-9
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 54, Citation Count: 15
|
|
|
ABSTRACT
We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form \genfunc, where \domain and \range are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x_i that any sampling algorithm needs to query to approximate f(x_1,\ldots,x_n).We define two quantitative properties of functions --- the it block sensitivity and the minimum Hellinger distance --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds.We also point out some connections between sampling and streaming algorithms and lossy compression schemes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, 1985.
|
 |
5
|
Andrei Z. Broder , Moses Charikar , Alan M. Frieze , Michael Mitzenmacher, Min-wise independent permutations (extended abstract), Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.327-336, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276781]
|
| |
6
|
H. Buhrman and R. de Wolf. Complexity measures and decision tree complexity: A survey, 1999. Available at http://www.cwi.nl/rdewolf.
|
| |
7
|
|
| |
8
|
|
 |
9
|
Moses Charikar , Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Towards estimation error guarantees for distinct values, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.268-279, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335230]
|
| |
10
|
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. American Mathematical Society, 23:493-507, 1952.
|
| |
11
|
|
| |
12
|
P. Diaconis. Group Representation in Probability and Statistics. IMS Lecture Series 11, Institute of Mathematical Statistics, 1999.
|
| |
13
|
|
| |
14
|
J. Feigenbaum , S. Kannan , M. Strauss , M. Viswanathan, Testing and spot-checking of data streams (extended abstract), Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.165-174, January 09-11, 2000, San Francisco, California, United States
|
 |
15
|
|
| |
16
|
O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 2000. TR00-020.
|
| |
17
|
|
| |
18
|
W. Hoeding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
|
| |
19
|
|
| |
20
|
|
| |
21
|
L. Le Cam and G. Lo Yang. Asymptotics in Statistics - Some Basic Concepts, pages 24-30. Springer-Verlag, 1990.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
D. Siegmund. Sequential Analysis - Tests and Confidence Intervals. Springer-Verlag, 1985.
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
A.-C. Yao. Probabilistic computations: toward a unified measure of complexity. InProceedings of the 18th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 222-227, 1977.
|
CITED BY 15
|
|
|
|
|
|
|
|
Suman Nath , Phillip B. Gibbons , Srinivasan Seshan , Zachary R. Anderson, Synopsis diffusion for robust aggregation in sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fabio Picconi , Nishkam Ravi , Marco Gruteser , Liviu Iftode, Probabilistic validation of aggregated data in vehicular ad-hoc networks, Proceedings of the 3rd international workshop on Vehicular ad hoc networks, September 29-29, 2006, Los Angeles, CA, USA
|
|
|
Tamás Sarlós , Adrás A. Benczúr , Károly Csalogány , Dániel Fogaras , Balázs Rácz, To randomize or not to randomize: space optimal summaries for hyperlink analysis, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|