| Probabilistic frequent itemset mining in uncertain databases |
| Full text |
Mov
(22:38),
Pdf
(888 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 119-128
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Thomas Bernecker
|
Ludwig-Maximilians-Universität, Munich, Germany
|
|
Hans-Peter Kriegel
|
Ludwig-Maximilians-Universität, Munich, Germany
|
|
Matthias Renz
|
Ludwig-Maximilians-Universität, Munich, Germany
|
|
Florian Verhein
|
Ludwig-Maximilians-Universität, Munich, Germany
|
|
Andreas Zuefle
|
Ludwig-Maximilians-Universität, Munich, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 75, Downloads (12 Months): 245, Citation Count: 0
|
|
|
ABSTRACT
Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard "certain" transaction databases. The consideration of existential uncertainty of item(sets), indicating the probability that an item(set) occurs in a transaction, makes traditional techniques inapplicable. In this paper, we introduce new probabilistic formulations of frequent itemsets based on possible world semantics. In this probabilistic context, an itemset X is called frequent if the probability that X occurs in at least minSup transactions is above a given threshold τ. To the best of our knowledge, this is the first approach addressing this problem under possible worlds semantics. In consideration of the probabilistic formulations, we present a framework which is able to solve the Probabilistic Frequent Itemset Mining (PFIM) problem efficiently. An extensive experimental evaluation investigates the impact of our proposed techniques and shows that our approach is orders of magnitude faster than straight-forward approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Parag Agrawal , Omar Benjelloun , Anish Das Sarma , Chris Hayworth , Shubha Nabar , Tomoe Sugihara , Jennifer Widom, Trio: a system for data, uncertainty, and lineage, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
| |
2
|
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'94), Minneapolis, MN, 1994.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Chun Kit Chui and Ben Kao. A decremental approach for mining frequent itemsets from uncertain data. In The 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 64--75, 2008.
|
| |
6
|
Chun Kit Chui, Ben Kao, and Edward Hung. Mining frequent itemsets from uncertain data. In 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2007, Nanjing, China, pages 47--58, 2007.
|
| |
7
|
|
| |
8
|
Karolien Geurts, Geert Wets, Tom Brijs, and Koen Vanhoof. Profiling high frequency accident locations using association rules. In Proceedings of the 82nd Annual Transportation Research Board, Washington DC. (USA), January 12-16, page 18pp, 2003.
|
 |
9
|
|
| |
10
|
H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz. "Probabilistic Similarity Join on Uncertain Data". In Proc. 11th Int. Conf. on Database Systems for Advanced Applications, Singapore, pp. 295--309, 2006.
|
| |
11
|
|
| |
12
|
C. Re, N. Dalvi, and D. Suciu. "Efficient top-k query evaluation on probalistic databases". In Proc. 23rd Int. Conf. on Data Engineering, Istanbul, Turkey, 2007.
|
| |
13
|
P. Sen and A. Deshpande. "Representing and querying correlated tuples in probabilistic databases". In Proc. 23rd Int. Conf. on Data Engineering, Istanbul, Turkey, 2007.
|
| |
14
|
M.A. Soliman, I.F. Ilyas, and K. Chen-Chuan Chang. "Top-k Query Processing in Uncertain Databases". In Proc. 23rd Int. Conf. on Data Engineering, Istanbul, Turkey, pages 896--905, 2007.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
|