| Succinct summarization of transactional databases: an overlapped hyperrectangle scheme |
| Full text |
Pdf
(266 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 758-766
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Yang Xiang
|
Kent State University, Kent, OH, USA
|
|
Ruoming Jin
|
Kent State University, Kent, OH, USA
|
|
David Fuhry
|
Kent State University, Kent, OH, USA
|
|
Feodor F. Dragan
|
Kent State University, Kent, OH, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 21, Downloads (12 Months): 165, Citation Count: 1
|
|
|
ABSTRACT
Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles; we then prove that this problem and its several variations are NP-hard. We develop an approximation algorithm HYPER which can achieve a ln(k) + 1 approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm. Additionally, we propose an efficient algorithm to further summarize the set of hyperrectangles by allowing false positive conditions. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
4
|
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/ borgelt/Software. Version 4.08.
|
| |
5
|
|
| |
6
|
V. Chvátal. A greedy heuristic for the set-covering problem. Math. Oper. Res, 4:233--235, 1979.
|
| |
7
|
|
| |
8
|
|
| |
9
|
David Johnson , Shankar Krishnan , Jatin Chhugani , Subodh Kumar , Suresh Venkatasubramanian, Compressing large boolean matrices using reordering techniques, Proceedings of the Thirtieth international conference on Very large data bases, p.13-23, August 31-September 03, 2004, Toronto, Canada
|
| |
10
|
D. Pisinger Kellerer, Hans; U. Pferschy. Knapsack Problems. Springer Verlag, 2005.
|
| |
11
|
Laks V. S. Lakshmanan , Raymond T. Ng , Christine Xing Wang , Xiaodong Zhou , Theodore J. Johnson, The generalized MDL approach for summarization, Proceedings of the 28th international conference on Very Large Data Bases, p.766-777, August 20-23, 2002, Hong Kong, China
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Arno Siebes, Jilles Vreeken, and Matthijs van Leeuwen. Item sets that compress. In SDM, 2006.
|
 |
16
|
|
| |
17
|
Matthijs van Leeuwen, Jilles Vreeken, and Arno Siebes. Compression picks item sets that matter. In PKDD, pages 585--592, 2006.
|
 |
18
|
|
| |
19
|
|
|