ACM Home Page
Please provide us with feedback. Feedback
Finding low-entropy sets and trees from binary data
Full text MovMov (13:06),  PdfPdf (947 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 350 - 359  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Hannes Heikinheimo  Helsinki University of Technology
Eino Hinkkanen  University of Helsinki
Heikki Mannila  University of Helsinki
Taneli Mielikäinen  University of Helsinki and Nokia Research Center
Jouni K. Seppänen  Helsinki University of Technology
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 115,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281232
What is a DOI?

ABSTRACT

The discovery of subsets with special properties from binary data hasbeen one of the key themes in pattern discovery. Pattern classes suchas frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure.

We consider the problem of finding all subsets of attributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all low-entropy itemsets. We also show that the tree-based measures are bounded above by the entropy of the corresponding itemset, allowing similar algorithms to be used for finding low-entropy trees. We describe algorithms for finding all subsets satisfying an entropy condition. We give an extensive empirical evaluation of the performance of the methods both on synthetic and on real data. We also discuss the search for high-entropy subsets and the computation of the Vapnik-Chervonenkis dimension of the data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
Bringmann, B., Zimmermann, A., De Raedt, L., and Nijssen, S. Don't be afraid of simpler patterns. In PKDD (2006), pp. 55--66.
 
6
Chow, C. K., and Liu, C. N. Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory 14, 3 (1968), 462--467.
 
7
 
8
 
9
10
11
 
12
Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer, 2001.
 
13
 
14
Heikinheimo, H., Mannila, H., and Seppänen, J. K. Finding trees from unordered 0-1 data. In PKDD (2006), pp. 175--186.
 
15
Huber, P. J. Projection pursuit. The Annals of Statistics 13, 2 (June 1985), 435--475.
16
 
17
Koivisto, M. Advances in exact Bayesian structure discovery in Bayesian networks. In UAI (2006), pp. 241--248.
 
18
19
20
 
21
 
22
Siebes, A., Vreeken, J., and van Leeuwen, M. Item sets that compress. In SDM (2006), J. Ghosh, D. Lambert, D. B. Skillicorn, and J. Srivastava, Eds., SIAM.
 
23
van Leeuwen, M., Vreeken, J., and Siebes, A. Compression picks item sets that matter. In PKDD (2006), pp. 585--592.
 
24
Zimmermann, A., and De Raedt, L. CorClass: Correlated association rule mining for classification. In Discovery Science (2004), E. Suzuki and S. Arikawa, Eds., vol. 3245 of Lecture Notes in Computer Science, Springer, pp. 60--72.


Collaborative Colleagues:
Hannes Heikinheimo: colleagues
Eino Hinkkanen: colleagues
Heikki Mannila: colleagues
Taneli Mielikäinen: colleagues
Jouni K. Seppänen: colleagues