ACM Home Page
Please provide us with feedback. Feedback
Summarizing itemset patterns using probabilistic models
Full text PdfPdf (790 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
POSTER SESSION: Research track posters table of contents
Pages: 730 - 735  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Chao Wang  Ohio State University, Columbus, OH
Srinivasan Parthasarathy  Ohio State University, Columbus, OH
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 115,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150495
What is a DOI?

ABSTRACT

In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, post-processing, and end-user interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset are modeled as random variables. We then construct a Markov Random Fields (MRF) on these variables based on frequent itemsets and their occurrence statistics. The summarization proceeds in a level-wise iterative fashion. Occurrence statistics of itemsets at the lowest level are used to construct an initial MRF. Statistics of itemsets at the next level can then be inferred from the model. We use those patterns whose occurrence can not be accurately inferred from the model to augment the model in an iterative manner, repeating the procedure until all frequent itemsets can be modeled. The resulting MRF model affords a concise and useful representation of the original collection of itemsets. Extensive empirical study on real datasets show that the new approach can effectively summarize a large number of itemsets and typically significantly outperforms extant approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
C. Borgelt. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
 
4
 
5
T. Calders and B. Goethals. Depth-first non-derivable itemset mining. In Proceedings of the SIAM 2005 International Conference on Data Mining, 2005.
 
6
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall/CRC, 2004.
 
7
 
8
9
10
 
11
 
12
 
13
S. Lauritzen and D. Speigelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B (Methodological), 50(2):157--224, 1988.
 
14
 
15
 
16
C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In The Ohio State University, Technical Report, 2006.
17
 
18
M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, 2002.
 
19
M. J. Zaki, S. Parthasarathy, and W. L. Mitsunori Ogihara. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 283--286, 1997.


Collaborative Colleagues:
Chao Wang: colleagues
Srinivasan Parthasarathy: colleagues