|
ABSTRACT
Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Frequent itemset mining data repository. http://fimi.cs.helsinki.fi/data.
|
| |
2
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
3
|
|
 |
4
|
Tom Brijs , Gilbert Swinnen , Koen Vanhoof , Geert Wets, Using association rules for product assortment decisions: a case study, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.254-260, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312241]
|
 |
5
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
| |
6
|
George Casella and Roger L. Berger. Statistical Inference, 2nd. Edition. DUXBURY Publishers, 2001.
|
 |
7
|
Deepayan Chakrabarti , Spiros Papadimitriou , Dharmendra S. Modha , Christos Faloutsos, Fully automatic cross-associations, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014064]
|
 |
8
|
|
 |
9
|
|
| |
10
|
Karolien Geurts, Geert Wets, Tom Brijs, and Koen Vanhoof. Profiling high frequency accident locations using association rules. In Proceedings of the 82d Annual Transportation Research Board, 2003.
|
 |
11
|
Dimitrios Gunopulos , Roni Khardon , Heikki Mannila , Sanjeev Saluja , Hannu Toivonen , Ram Sewak Sharma, Discovering all most specific sentences, ACM Transactions on Database Systems (TODS), v.28 n.2, p.140-174, June 2003
[doi> 10.1145/777943.777945]
|
 |
12
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
13
|
|
| |
14
|
E. L. Lehmann and George Casella. Theory of Point Estimation, 2nd Edition. Springer-Verlag, 1998.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
Takeaki Uno , Masashi Kiyomi , Hiroki Arimura, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, p.77-86, August 21-21, 2005, Chicago, Illinois
[doi> 10.1145/1133905.1133916]
|
| |
20
|
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.
|
 |
21
|
|
| |
22
|
|
|