ACM Home Page
Please provide us with feedback. Feedback
Non-derivable itemset mining
Source Data Mining and Knowledge Discovery archive
Volume 14 ,  Issue 1  (February 2007) table of contents
Pages: 171 - 206  
Year of Publication: 2007
ISSN:1384-5810
Authors
Toon Calders  Eindhoven Technical University, Eindhoven, The Netherlands and University of Antwerp, Antwerp, Belgium
Bart Goethals  University of Antwerp, Antwerp, Belgium
Publisher
Kluwer Academic Publishers  Hingham, MA, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1007/s10618-006-0054-6

ABSTRACT

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
Bonferroni C (1936) Teoria statistica della classi e calcolo della probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze 8:1-62.
 
6
 
7
 
8
9
 
10
 
11
Calders T (2003a) Axiomatization and deduction rules for the frequency of itemsets. Ph. D. thesis, University of Antwerp, Belgium.
 
12
Calders T (2003b) Deducing bounds on the support of itemsets. In: Database technologies for data mining, vol 2682 of LNCS, pp 214-233, Springer.
 
13
 
14
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, vol 2838 of Lecture Notes in Computer Science, pp 71-82. Springer-Verlag.
 
15
Calders T, Goethals B (2005a) Depth-first non-derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining.
 
16
Calders T, Goethals B (2005b) Quick inclusion-exclusion. In: Proceedings ECML-PKDD 2005 Workshop Knowledge Discovery in Inductive Databases, vol 3933 of LNCS, pp 86-103. Springer.
 
17
Dexters N, Calders T (2004) Theoretical bounds on the size of condensed representations. In: Proceedings ECML-PKDD 2004 Workshop Knowledge Discovery in Inductive Databases, pp 25-36.
 
18
Dobra A (2002) Statistical tools for disclosure limitation in multi-way contingency tables. Ph. D. thesis, Department of Statistics, Carnegie Mellon University.
 
19
Dobra A, Fienberg S (2000) Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc Nat Acad Sci 97(22):11885-11892.
 
20
Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. UNECE Stat J 18:363-371.
 
21
Fienberg SE (1998) Fréchet and bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical data protection (SDP-98), pp 115-129. Eurostat.
 
22
Fréchet M (1951) Sur les tableaux de correlation dont les marges sont donnés. Ann Univ Lyon Sect A, Series 3 14:53-77.
 
23
Galambos J, Simonelli I (1996) Bonferroni-type inequalities with applications. Springer.
 
24
Goethals B, Muhonen J, Toivonen H (2005) Nonderivable association rules. In: Proc. SIAM Int. Conf. on Data Mining.
25
 
26
Groth D, Robertson E (2001) Discovering frequent itemsets in the presence of highly frequent items. In: In Proceedings Workshop on Rule Based DataMining, in Conjunction with the 14th International Conference On Applications of Prolog.
27
 
28
 
29
Jaroszewicz S, Simivici DA, Rosenberg I (2002) An inclusion-exclusion result for boolean polynomials and its applications in data mining. In: Proc. of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference.
 
30
Jordan C, (1927) The foundations of the theory of probability. Mat Phys Lapok 34:109-136.
 
31
Kahn J, Linial N, Samorodnitsky A (1996) Inclusion-exclusion: Exact and approximate. Combinatorica 16:465-477.
 
32
 
33
 
34
 
35
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases.
 
36
Melkman AA, Shimony SE (1997) A note on approximate inclusion-exclusion. Discrete Appl Math 73:23-26.
 
37
 
38
Jian Pei , Guozhu Dong , Wei Zou , Jiawei Han, Mining Condensed Frequent-Pattern Bases, Knowledge and Information Systems, v.6 n.5, p.570-594, September 2004
 
39
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX.
 
40
 
41
Zaki M, Hsiao C (1999) ChARM: an efficient algorithm for closed association rule mining. In: Technical Report 99-10, Computer Science, Rensselaer Polytechnic Institute.
 
42
Zaki M, Parthasarathy S, Ogihara M, Li W(1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp 283-286. AAAI Press.

CITED BY  10

Collaborative Colleagues:
Toon Calders: colleagues
Bart Goethals: colleagues