|
ABSTRACT
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only
In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a IIierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes. We show that eager and lazy decompression strategies produce sub-optimal plans for queries involving compressed string attributes. We then formalize the problem of compression-aware query optimization and propose one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schemas with compressed attributes; our algorithms can easily be integrated into existing cost-based query optimizers. Experiments using TPC-H data demonstrate the impact of our string compression methods and show the importance of compression-aware query optimization. Our approach results in up to an order speed up over existing approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Transact on processing performance council TPC-H benchmark, http://www.tpc.org 1999.
|
| |
2
|
Predator DMBS. http://www.cs.cornel l.edu/database/predator, Cornel l Univ., Computer Science Dept.,2000.
|
| |
3
|
|
| |
4
|
|
| |
5
|
C.Blake and C.Merz.UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html 1998.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
J.G.Cleary and I.H.W tten.Data compression using adaptive coding and partial string matching. IEEE Trans. on Communications COM-32(4),pages 396 -402,April 1984.
|
 |
10
|
|
| |
11
|
S.J.Eggers,F.Olken,and A.Shoshani.A compress on techn que for large statist cal data-bases. In Proc.of VLDB pages 424 -434,1981.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
G.Graefe and L.Shapiro.Data compression and database performance.In ACM/IEEE-CS Symp. On Applied Computing pages 22 -27,April 1991.
|
 |
16
|
|
 |
17
|
|
| |
18
|
D.Hu .man.A method for the construct on of m nimum-redundanc codes.In Proc. IRE, 40(9), pages 1098 -1101,Sept.1952.
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
G.Ray,J.R.Harista,and S.Seshadri.Database compression:A performance enhancement tool.In the 7th Int'l Conf. on Management of Data (COMAD), Pune,India,1995.
|
 |
25
|
|
 |
26
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
27
|
D.Severance.A pract t oner 's guide to database compression.Information Systems 8(1),pages 51 -62, 1983.
|
| |
28
|
T.Welch.A technique for high-performance data compression.IEEE Computer 17(6),pages 8 -19,June 1984.
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
J.Ziv and A.Lempel.On the complexity of .nite sequences.IEEE Trans. on Information Theory, 22(1),pages 75 -81,1976.
|
| |
33
|
J.Ziv and A.Lempel.A universal algorithm for sequent al data compression.IEEE Trans. on Information Theory, 22(1),pages 337 -343,1977.
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
Hu Cao , Ouri Wolfson , Goce Trajcevski, Spatio-temporal data reduction with deterministic error bounds, Proceedings of the 2003 joint workshop on Foundations of mobile computing, p.33-42, September 19, 2003, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Arion , Angela Bonifati , Gianni Costa , Sandra D'Aguanno , Ioana Manolescu , Andrea Pugliese, XQueC: pushing queries to compressed XML data, Proceedings of the 29th international conference on Very large data bases, p.1065-1068, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chad Verbowski , Emre Kiciman , Arunvijay Kumar , Brad Daniels , Shan Lu , Juhan Lee , Yi-Min Wang , Roussi Roussev, Flight data recorder: monitoring persistent-state interactions to improve systems management, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
|