ACM Home Page
Please provide us with feedback. Feedback
Effective and efficient itemset pattern summarization: regression-based approaches
Full text PdfPdf (264 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 399-407  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Ruoming Jin  Kent State University, Kent, OH, USA
Muad Abu-Ata  Kent State University, Kent, OH, USA
Yang Xiang  Kent State University, Kent, OH, USA
Ning Ruan  Kent State University, Kent, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 203,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401941
What is a DOI?

ABSTRACT

In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the nonlinear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The tree-regression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
The r project for statistical computing. http://www.r-project.org/.
2
3
 
4
 
5
 
6
Alan Agresti. Categorical Data Analysis. Wiley, 2002.
 
7
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/ borgelt/Software.
 
8
 
9
 
10
 
11
 
12
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
13
 
14
 
15
16
 
17
F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on, 47(2):498--519, 2001.
 
18
 
19
 
20
21
 
22
G. A. F. Seber and C. J. Wild. Nonlinear Regression. John Weiley & Sons, Inc., 1989.
 
23
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.
24
25
26
 
27
28
 
29
 
30
31
32


Collaborative Colleagues:
Ruoming Jin: colleagues
Muad Abu-Ata: colleagues
Yang Xiang: colleagues
Ning Ruan: colleagues