| Effective and efficient itemset pattern summarization: regression-based approaches |
| Full text |
Pdf
(264 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 399-407
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Ruoming Jin
|
Kent State University, Kent, OH, USA
|
|
Muad Abu-Ata
|
Kent State University, Kent, OH, USA
|
|
Yang Xiang
|
Kent State University, Kent, OH, USA
|
|
Ning Ruan
|
Kent State University, Kent, OH, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 24, Downloads (12 Months): 233, Citation Count: 2
|
|
|
ABSTRACT
In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the nonlinear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The tree-regression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The r project for statistical computing. http://www.r-project.org/.
|
 |
2
|
|
 |
3
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
Alan Agresti. Categorical Data Analysis. Wiley, 2002.
|
| |
7
|
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/ borgelt/Software.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
|
 |
13
|
Jun Huan , Wei Wang , Deepak Bandyopadhyay , Jack Snoeyink , Jan Prins , Alexander Tropsha, Mining protein family specific residue packing patterns from protein structure graphs, Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, p.308-315, March 27-31, 2004, San Diego, California, USA
[doi> 10.1145/974614.974655]
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on, 47(2):498--519, 2001.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
G. A. F. Seber and C. J. Wild. Nonlinear Regression. John Weiley & Sons, Inc., 1989.
|
| |
23
|
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.
|
 |
24
|
|
 |
25
|
|
 |
26
|
Dong Xin , Hong Cheng , Xifeng Yan , Jiawei Han, Extracting redundancy-aware top-k patterns, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150452]
|
| |
27
|
|
 |
28
|
Xifeng Yan , Hong Cheng , Jiawei Han , Dong Xin, Summarizing itemset patterns: a profile-based approach, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081907]
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
|