ACM Home Page
Please provide us with feedback. Feedback
Dictionary-based order-preserving string compression for main memory column stores
Full text PdfPdf (473 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 8: column stores table of contents
Pages 283-296  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Carsten Binnig  ETH Zurich, Zurich, Switzerland
Stefan Hildenbrand  ETH Zurich, Zurich, Switzerland
Franz Färber  SAP AG, Walldorf, Germany
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559877
What is a DOI?

ABSTRACT

Column-oriented database systems [19, 23] perform better than traditional row-oriented database systems on analytical workloads such as those found in decision support and business intelligence applications. Moreover, recent work [1, 24] has shown that lightweight compression schemes significantly improve the query processing performance of these systems. One such a lightweight compression scheme is to use a dictionary in order to replace long (variable-length) values of a certain domain with shorter (fixedlength) integer codes. In order to further improve expensive query operations such as sorting and searching, column-stores often use order-preserving compression schemes.

In contrast to the existing work, in this paper we argue that orderpreserving dictionary compression does not only pay off for attributes with a small fixed domain size but also for long string attributes with a large domain size which might change over time. Consequently, we introduce new data structures that efficiently support an order-preserving dictionary compression for (variablelength) string attributes with a large domain size that is likely to change over time. The main idea is that we model a dictionary as a table that specifies a mapping from string-values to arbitrary integer codes (and vice versa) and we introduce a novel indexing approach that provides efficient access paths to such a dictionary while compressing the index data. Our experiments show that our data structures are as fast as (or in some cases even faster than) other state-of-the-art data structures for dictionaries while being less memory intensive.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
N. Askitis and J. Zobel. Cache-Conscious Collision Resolution in String Hash Tables. In SPIRE, pages 91--102, 2005.
5
 
6
7
8
 
9
 
10
 
11
 
12
 
13
14
15
 
16
17
18
 
19
20
 
21
 
22
 
23
M. Zukowski, P. A. Boncz, N. Nes, and S. Héman. MonetDB/X100 -- A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28(2):17--22, 2005.
 
24

Collaborative Colleagues:
Carsten Binnig: colleagues
Stefan Hildenbrand: colleagues
Franz Färber: colleagues