|
ABSTRACT
All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
Bonferroni C (1936) Teoria statistica della classi e calcolo della probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze 8:1-62.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
Calders T (2003a) Axiomatization and deduction rules for the frequency of itemsets. Ph. D. thesis, University of Antwerp, Belgium.
|
| |
12
|
Calders T (2003b) Deducing bounds on the support of itemsets. In: Database technologies for data mining, vol 2682 of LNCS, pp 214-233, Springer.
|
| |
13
|
|
| |
14
|
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, vol 2838 of Lecture Notes in Computer Science, pp 71-82. Springer-Verlag.
|
| |
15
|
Calders T, Goethals B (2005a) Depth-first non-derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining.
|
| |
16
|
Calders T, Goethals B (2005b) Quick inclusion-exclusion. In: Proceedings ECML-PKDD 2005 Workshop Knowledge Discovery in Inductive Databases, vol 3933 of LNCS, pp 86-103. Springer.
|
| |
17
|
Dexters N, Calders T (2004) Theoretical bounds on the size of condensed representations. In: Proceedings ECML-PKDD 2004 Workshop Knowledge Discovery in Inductive Databases, pp 25-36.
|
| |
18
|
Dobra A (2002) Statistical tools for disclosure limitation in multi-way contingency tables. Ph. D. thesis, Department of Statistics, Carnegie Mellon University.
|
| |
19
|
Dobra A, Fienberg S (2000) Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc Nat Acad Sci 97(22):11885-11892.
|
| |
20
|
Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. UNECE Stat J 18:363-371.
|
| |
21
|
Fienberg SE (1998) Fréchet and bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical data protection (SDP-98), pp 115-129. Eurostat.
|
| |
22
|
Fréchet M (1951) Sur les tableaux de correlation dont les marges sont donnés. Ann Univ Lyon Sect A, Series 3 14:53-77.
|
| |
23
|
Galambos J, Simonelli I (1996) Bonferroni-type inequalities with applications. Springer.
|
| |
24
|
Goethals B, Muhonen J, Toivonen H (2005) Nonderivable association rules. In: Proc. SIAM Int. Conf. on Data Mining.
|
 |
25
|
|
| |
26
|
Groth D, Robertson E (2001) Discovering frequent itemsets in the presence of highly frequent items. In: In Proceedings Workshop on Rule Based DataMining, in Conjunction with the 14th International Conference On Applications of Prolog.
|
 |
27
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
28
|
|
| |
29
|
Jaroszewicz S, Simivici DA, Rosenberg I (2002) An inclusion-exclusion result for boolean polynomials and its applications in data mining. In: Proc. of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference.
|
| |
30
|
Jordan C, (1927) The foundations of the theory of probability. Mat Phys Lapok 34:109-136.
|
| |
31
|
Kahn J, Linial N, Samorodnitsky A (1996) Inclusion-exclusion: Exact and approximate. Combinatorica 16:465-477.
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases.
|
| |
36
|
Melkman AA, Shimony SE (1997) A note on approximate inclusion-exclusion. Discrete Appl Math 73:23-26.
|
| |
37
|
|
| |
38
|
Jian Pei , Guozhu Dong , Wei Zou , Jiawei Han, Mining Condensed Frequent-Pattern Bases, Knowledge and Information Systems, v.6 n.5, p.570-594, September 2004
|
| |
39
|
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX.
|
| |
40
|
|
| |
41
|
Zaki M, Hsiao C (1999) ChARM: an efficient algorithm for closed association rule mining. In: Technical Report 99-10, Computer Science, Rensselaer Polytechnic Institute.
|
| |
42
|
Zaki M, Parthasarathy S, Ogihara M, Li W(1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp 283-286. AAAI Press.
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
Ruoming Jin , Muad Abu-Ata , Yang Xiang , Ning Ruan, Effective and efficient itemset pattern summarization: regression-based approaches, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sami Hanhijärvi , Markus Ojala , Niko Vuokko , Kai Puolamäki , Nikolaj Tatti , Heikki Mannila, Tell me something I don't know: randomization strategies for iterative data mining, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|