ACM Home Page
Please provide us with feedback. Feedback
Dwarf: shrinking the PetaCube
Full text PdfPdf (1.38 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Research sessions: compression table of contents
Pages: 464 - 475  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Yannis Sismanis  University of Maryland, College Park
Antonios Deligiannakis  University of Maryland, College Park
Nick Roussopoulos  University of Maryland, College Park
Yannis Kotidis  AT&T Labs-Research
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 49,   Citation Count: 36
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564745
What is a DOI?

ABSTRACT

Dwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery,in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.This paper describes the Dwarf structure and the Dwarf cube construction algorithm. Further optimizations are then introduced for improving clustering and query performance. Experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques. The comparisons show that Dwarfs by far out-perform these techniques on all counts: storage space, creation time, query response time, and updates of cubes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
{Bla} Jock A. Blackard. The Forest CoverType Dataset. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype.
 
4
5
 
6
{BS98} D. Barbara and M. Sullivan. A Space-Efficient way to support Approximate Multidimensional Databases. Technical report, ISSE-TR-98-03, George Mason University, 1998.
 
7
{Cou98} Olap Council. APB-1 Benchmark. http://www.olapcouncil.org/research/bmarkco.htm, 1998.
 
8
{DANR96} P. M. Deshpande, S. Agarwal, J. F. Naughton, and R. Ramakrishnan. Computation of multidimensional aggregates. Technical Report 1314, University of Wisconsin - Madison, 1996.
9
 
10
 
11
 
12
13
 
14
15
16
 
17
{HWL} C. Hahn, S. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html.
 
18
{JS97} T. Johnson and D. Shasha. Some Approaches to Index Design for Cube Forests. Data Engineering Bulletin, 20(1), March 1997.
19
20
 
21
 
22
{RSDK01} N. Roussopoulos, J. Sismanis, A. Deligiannakis, and Y. Kotidis. The Dwarf Structure for Creating, Storing, and Querying Highly Compressed Data Cubes. Application to U.S. patent office submitted, June 2001.
 
23
{SAG96} S. Sarawagi, R. Agrawal, and A. Gupta. On computing the data cube. Technical Report RJ10026, IBM Almaden Research Center, San Jose, CA, 1996.
 
24
 
25
{SDRK02} Y. Sismanis, A. Deligiannakis, N. Roussopoulos, and Y. Kotidis. Dwarf: Shrinking the PetaCube. Technical Report CS-TR 4342, University of Maryland, College Park, February 2002.
26
27
 
28
{WLFY02} W. Wang, H. Lu, J. Feng, and J. Xu Yu. Condensed Cube: An Effective Approach to Reducing Data Cube Size. In ICDE, 2002.
29

CITED BY  37

Collaborative Colleagues:
Yannis Sismanis: colleagues
Antonios Deligiannakis: colleagues
Nick Roussopoulos: colleagues
Yannis Kotidis: colleagues