ACM Home Page
Please provide us with feedback. Feedback
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic
Full text PdfPdf (277 KB)
Source ACM International Conference Proceeding Series; Vol. 261 archive
Proceedings of the 11th international conference on Extending database technology: Advances in database technology table of contents
Nantes, France
SESSION: Industrial sessions: Industrial 1 table of contents
Pages 618-629  
Year of Publication: 2008
ISBN:978-1-59593-926-5
Authors
Ahmed Metwally  Ask.com, Oakland, CA
Divyakant Agrawal  Ask.com, Oakland, CA
Amr El Abbadi  University of California at Santa Barbara, Santa Barbara, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 70,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1353343.1353418
What is a DOI?

ABSTRACT

Estimating the number of distinct elements in a large multiset has several applications, and hence has attracted active research in the past two decades. Several sampling and sketching algorithms have been proposed to accurately solve this problem. The goal of the literature has always been to estimate the number of distinct elements while using minimal resources. However, in some modern applications, the accuracy of the estimate is of primal importance, and businesses are willing to trade more resources for better accuracy. Throughout our experience with building a distinct count system at a major search engine, Ask.com, we reviewed the literature of approximating distinct counts, and compared most algorithms in the literature. We deduced that Linear Counting, one of the least used algorithms, has unique and impressive advantages when the accuracy of the distinct count is critical to the business. For other estimators to attain comparable accuracy, they need more space than Linear Counting. We have supported our analytical results through comprehensive experiments. The experimental results highly favor Linear Counting when the number of distinct elements is large and the error tolerance is low.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
5
 
6
T. Bohman, C. Cooper, and A. Frieze. Min-Wise Independent Linear Permutations. Electronic Journal of Combinatorics, 7:R26, 2000.
7
8
 
9
 
10
11
 
12
 
13
M. Durand and P. Flajolet. Loglog Counting of Large Cardinalities (extended abstract). In Proceedings of the 11th ESA European Symposium on Algorithms, pages 605--617, 2003.
14
 
15
 
16
 
17
É. Fusy and F. Giroire. Estimating the Number of Active Flows in a Data Stream over a Sliding Window. In Proceedings of the 4th SIAM Workshop on Analytic Algorithmics and Combinatorics, 2007.
 
18
S. Ganguly. Counting Distinct Items over Update Streams. In Proceedings of the 16th ISAAC International Symposium on Algorithms and Computation, pages 505--514, 2005.
19
 
20
21
22
 
23
F. Giroire. Order Statistics and Estimating Cardinalities of Massive Datasets. In Proceedings of the 6th DMTCS Discrete Mathematics and Theoretical Computer Science, pages 157--166, 2005.
 
24
 
25
 
26
P. Haas and L. Stokes. Estimating the Number of Classes in a Finite Population. Journal of the American Statistical Association, 93:1475--1487, 1998.
 
27
28
29
30
31
 
32
33
 
34
 
35
 
36
 
37
 
38
F. Olken and D. Rotem. Random Sampling from Databases: A Survey. Journal of Statistics and Computing, 5:25--42, 1995.
 
39
G. Özsoyoglu, K. Du, A. Tjahjana, W.-C. Hou, and D. Y. Rowland. On Estimating COUNT, SUM, and AVERAGE Relational Algebra Queries. In Proceedings of the 2nd DEXA International Conference on Database and Expert Systems Applications, pages 406--412, 1991.
40
41
42
 
43


Collaborative Colleagues:
Ahmed Metwally: colleagues
Divyakant Agrawal: colleagues
Amr El Abbadi: colleagues