ACM Home Page
Please provide us with feedback. Feedback
Spectral bloom filters
Full text PdfPdf (323 KB)
Source International Conference on Management of Data archive
Proceedings of the 2003 ACM SIGMOD international conference on Management of data table of contents
San Diego, California
SESSION: Statistics table of contents
Pages: 241 - 252  
Year of Publication: 2003
ISBN:1-58113-634-X
Authors
Saar Cohen  Tel Aviv University
Yossi Matias  Tel Aviv University
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 118,   Citation Count: 24
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/872757.872787
What is a DOI?

ABSTRACT

A Bloom Filter is a space-efficient randomized data structure allowing membership queries over sets with certain allowable errors. It is widely used in many applications which take advantage of its ability to compactly represent a set, and filter out effectively any element that does not belong to the set, with small error probability. This paper introduces the Spectral Bloom Filter (SBF), an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time. Using memory only slightly larger than that of the original Bloom Filter, the SBF supports queries on the multiplicities of individual keys with a guaranteed, small error probability. The SBF also supports insertions and deletions over the data set. We present novel methods for reducing the probability and magnitude of errors. We also present an efficient data structure and algorithms to build it incrementally and maintain it over streaming data, as well as over materialized data with arbitrary insertions and deletions. The SBF does not assume any a priori filtering threshold and effectively and efficiently maintains information over the entire data-set, allowing for ad-hoc queries with arbitrary parameters and enabling a range of new applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
A. Broder and M. Mitzenmacher. Network applications of Bloom Filters: A survey. In Proc. of Allerton Conference, 2002.
 
4
A. Z. Broder. Personal communication.
 
5
S. Cohen and Y. Matias. Spectral bloom filters, Technical Report. Tel Aviv University, 2003.
 
6
 
7
P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194--202, 1975.
8
 
9
L. Fan, P. Cao, and J. Almeida. A prototype implementation of summary-cache enhanced icp in squid 1.1.14. www.cs.wisc.edu/~cao/sc-icp.html.
10
 
11
12
13
 
14
15
16
 
17
 
18
 
19
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. of the 28th International Conference on Very Large Data Bases, VLDB, 2002.
 
20
Y. Matias. Bloom Histograms, July 2001.
21
 
22
S. Rhea and J. Kubiatowicz. Probabilistic location and routing. In Proc. of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), 2002.
 
23
Squid Web Proxy Cache. http://www.squid-cache.org.

CITED BY  24

Collaborative Colleagues:
Saar Cohen: colleagues
Yossi Matias: colleagues