ACM Home Page
Please provide us with feedback. Feedback
Lower bounds for randomized read/write stream algorithms
Full text PdfPdf (294 KB)
Source
Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing table of contents
San Diego, California, USA
SESSION: Session 13B table of contents
Pages: 689 - 698  
Year of Publication: 2007
ISBN:978-1-59593-631-8
Authors
Paul Beame  University of Washington, Seattle, WA
T. S. Jayram  IBM Almaden Research Center, San Jose, CA
Atri Rudra  University of Washington, Seattle, WA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 58,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250790.1250891
What is a DOI?

ABSTRACT

Motivated by the capabilities of modern storage architectures, we consider the following generalization of the data stream model where the algorithm has sequential access to multiple streams. Unlike the data stream model, where the stream is read only, in this new model (introduced in [8,9]) the algorithms can also write onto streams. There is no limit on the size of the streams but the number of passes made on the streams is restricted. On the other hand, the amount of internal memory used by the algorithm is scarce, similar to data stream model.

We resolve the main open problem in [7] of proving lower bounds in this model for algorithms that are allowed to have 2-sided error. Previously, such lower bounds were shown only for deterministic and 1-sided error randomized algorithms [9,7]. We consider the classical set disjointness problemthat has proved to be invaluable for deriving lower bounds for many other problems involving data streams and other randomized models of computation. For this problem, we show a near-linear lower bound on the size of the internal memory used by a randomized algorithm with 2-sided error that is allowed to have o(log N/log log N) passes over the streams. This bound is almost optimal sincethere is a simple algorithm that can solve this problem using logarithmic memory if the number of passes over the streams.

Applications include near-linear lower bounds onthe internal memory for well-known problems in the literature:(1) approximately counting the number of distinct elements in the input (F0);(2) approximating the frequency of the mod of an input sequence(F*);(3) computing the join of two relations; and (4) deciding if some node of an XML document matches an XQuery (or XPath) query. Our techniques involve a novel direct-sum type of argument that yields lower bounds for many other problems. Our results asymptotically improve previously known bounds for any problem even in deterministic and 1-sided error models of computation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
[2] L. Babai, P. Frankl, and J. Simon. Complexity classes in communication complexity theory. In 27th Annual Symposium on Foundations of Computer Science (FOCS), pages 337-347, Toronto, Ontario, Oct. 1986. IEEE.
3
4
 
5
 
6
7
 
8
[8] M. Grohe, C. Koch, and N. Schweikardt. Tight lower bounds for query processing on streaming and external memory data external memory. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP), LNCS3580, pages 1076-1088, 2005.
9
 
10
 
11
 
12
 
13
14
 
15


Collaborative Colleagues:
Paul Beame: colleagues
T. S. Jayram: colleagues
Atri Rudra: colleagues