ACM Home Page
Please provide us with feedback. Feedback
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Full text PdfPdf (266 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 758-766  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Yang Xiang  Kent State University, Kent, OH, USA
Ruoming Jin  Kent State University, Kent, OH, USA
David Fuhry  Kent State University, Kent, OH, USA
Feodor F. Dragan  Kent State University, Kent, OH, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 165,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401981
What is a DOI?

ABSTRACT

Transactional data are ubiquitous. Several methods, including frequent itemsets mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles; we then prove that this problem and its several variations are NP-hard. We develop an approximation algorithm HYPER which can achieve a ln(k) + 1 approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm. Additionally, we propose an efficient algorithm to further summarize the set of hyperrectangles by allowing false positive conditions. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/ borgelt/Software. Version 4.08.
 
5
 
6
V. Chvátal. A greedy heuristic for the set-covering problem. Math. Oper. Res, 4:233--235, 1979.
 
7
 
8
 
9
 
10
D. Pisinger Kellerer, Hans; U. Pferschy. Knapsack Problems. Springer Verlag, 2005.
 
11
12
 
13
 
14
 
15
Arno Siebes, Jilles Vreeken, and Matthijs van Leeuwen. Item sets that compress. In SDM, 2006.
16
 
17
Matthijs van Leeuwen, Jilles Vreeken, and Arno Siebes. Compression picks item sets that matter. In PKDD, pages 585--592, 2006.
18
 
19


Collaborative Colleagues:
Yang Xiang: colleagues
Ruoming Jin: colleagues
David Fuhry: colleagues
Feodor F. Dragan: colleagues