| Maximally informative k-itemsets and their efficient discovery |
| Full text |
Pdf
(863 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
SESSION: Research track papers
table of contents
Pages: 237 - 244
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
Arno J. Knobbe
|
Kiminkii, Houten, The Netherlands & Utrecht University, Utrecht, The Netherlands
|
|
Eric K. Y. Ho
|
Kiminkii, Houten, The Netherlands
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 67, Citation Count: 2
|
|
|
ABSTRACT
In this paper we present a new approach to mining binary data. We treat each binary feature (item) as a means of distinguishing two sets of examples. Our interest is in selecting from the total set of items an itemset of specified size, such that the database is partitioned with as uniform a distribution over the parts as possible. To achieve this goal, we propose the use of joint entropy as a quality measure for itemsets, and refer to optimal itemsets of cardinality k as maximally informative k-itemsets. We claim that this approach maximises distinctive power, as well as minimises redundancy within the feature set. A number of algorithms is presented for computing optimal itemsets efficiently.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
Almuallim, H., Dietterich, T. G., Learning with Many Irrelevant Features, In Proceedings of AAAI '91, 1991
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
9
|
Hyvärinen, A., Karhunen, J., Oja, E., Independent Component Analysis, John Wiley & Sons, 2001
|
| |
10
|
|
| |
11
|
|
| |
12
|
Knobbe, A. J., Adriaans, P. W., Discovering Foreign Key Relations in Relational Databases, In Proceedings of EMCSR '96, 1996
|
| |
13
|
Knobbe, A. J., Multi-Relational Data Mining, Ph.D. dissertation, 2004, http://www.kiminkii.com/thesis.pdf
|
| |
14
|
|
| |
15
|
Koller, D., Sahami, M., Toward Optimal Feature Selection, In Proceedings of ICML '96, 1996
|
| |
16
|
|
| |
17
|
Kreher, D. L., Stinson, D. R., Combinatorial Algorithms, CRC Press, 1999
|
 |
18
|
|
| |
19
|
Pfahringer, B., Compression-Based Feature Subset Selection, In Proceedings of IJCAI '95, 1995
|
| |
20
|
Safarii Multi-Relational Data Mining environment, http://www.kiminkii.com/safarii.html, 2006
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Zaki, M. J., Orihara, M., Theoretical Foundations of Association Rules, In Proceedings ACM SIGMED workshop on research issues in KDD, 1998
|
CITED BY 2
|
|
Hannes Heikinheimo , Eino Hinkkanen , Heikki Mannila , Taneli Mielikäinen , Jouni K. Seppänen, Finding low-entropy sets and trees from binary data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|