|
ABSTRACT
A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding “maximum independence estimate.” This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinaltiy k the dependence inherited from its subsets of cardinality (k − 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability p(i) of the event i that indicates the prescence of the itemset {a,b,... } and its maximum independence estimate is constant for any combination of values of Q &angl0; a,b,... &angr0; Q. In1p
addition, the Boolean function specifying the combination of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
3
|
|
| |
4
|
|
 |
5
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
6
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
IMIELINSKI, T. 1997. From file mining to database mining. In Proceedings of the ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery (SIGMOD-96, Aug.), R. Ng, Ed. ACM Press, New York, NY, 35-39.
|
| |
11
|
IV, J. F. E. AND PREGIBON, D. 1995. A statistical perspective on kdd. Tech. Rep. KDD-95-93.
|
 |
12
|
Gomer Thomas , K. Kawagoe , R. Krishnamurthy , T. Imielinski , D. Reiner , A. Wolski, Practitioner problems in need of database research, ACM SIGMOD Record, v.20 n.3, p.73-84, Sept. 1991
[doi> 10.1145/126482.126491]
|
| |
13
|
|
| |
14
|
|
 |
15
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
CITED BY 5
|
|
Peter Fule , John F. Roddick, Experiences in building a tool for navigating association rule result sets, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.103-108, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|