ACM Home Page
Please provide us with feedback. Feedback
Estimating the number of frequent itemsets in a large database
Full text PdfPdf (768 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Data mining table of contents
Pages 505-516  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Ruoming Jin  Kent State University
Scott McCallen  Kent State University
Yuri Breitbart  Kent State University
Dave Fuhry  Kent State University
Dong Wang  Kent State University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 92,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516420
What is a DOI?

ABSTRACT

Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Frequent itemset mining data repository. http://fimi.cs.helsinki.fi/data.
 
2
 
3
4
5
 
6
George Casella and Roger L. Berger. Statistical Inference, 2nd. Edition. DUXBURY Publishers, 2001.
7
8
9
 
10
Karolien Geurts, Geert Wets, Tom Brijs, and Koen Vanhoof. Profiling high frequency accident locations using association rules. In Proceedings of the 82d Annual Transportation Research Board, 2003.
11
12
 
13
 
14
E. L. Lehmann and George Casella. Theory of Point Estimation, 2nd Edition. Springer-Verlag, 1998.
 
15
 
16
 
17
 
18
19
 
20
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.
21
 
22
Collaborative Colleagues:
Ruoming Jin: colleagues
Scott McCallen: colleagues
Yuri Breitbart: colleagues
Dave Fuhry: colleagues
Dong Wang: colleagues