ACM Home Page
Please provide us with feedback. Feedback
Query processing techniques for solid state drives
Full text PdfPdf (564 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 2: databases on modern hardware table of contents
Pages 59-72  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Dimitris Tsirogiannis  University of Toronto, Toronto, ON, Canada
Stavros Harizopoulos  HP Labs, Palo Alto, CA, USA
Mehul A. Shah  Hewlett Packard Laboratories, Palo Alto, CA, USA
Janet L. Wiener  Hewlett Packard Laboratories, Palo Alto, CA, USA
Goetz Graefe  Hewlett Packard Laboratories, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 109,   Downloads (12 Months): 334,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559854
What is a DOI?

ABSTRACT

Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data.

In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
]]D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. ICDE, pages 466--475, 2007.
 
3
 
4
]]P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper--pipelining query execution. CIDR, pages 225--237, 2005.
 
5
]]L. Bouganim, B. Jonsson, and P. Bonnet. uFlip: Understanding flash IO patterns. CIDR, 2009.
6
7
8
 
9
]]G. Graefe. The five-minute rule twenty years later, and how flash memory changes the rules. ACM Queue, pages 1--9, 2007.
 
10
 
11
]]S. Harizopoulos, M. A. Shah, J. Meza, and P. Ranganathan. Energy Efficiency: The New Holy Grail of Data Management Systems Research. CIDR, 2009.
 
12
 
13
]]J. Janukowicz, D. Reinsel, and J. Rydning. Worldwide solid state drive 2008--2012 forecast and analysis. Technical Report 212736, IDC, June 2008.
 
14
15
16
 
17
 
18
19
 
20
]]D. Myers. On the use of NAND flash memory in high-performance relational databases. MIT Msc Thesis, 2008.
 
21
 
22
]]M. Polte and J. Simsa and G. Gibson. Enabling enterprise solid state disks performance. Workshop on Integrating Solid-state Memory into the Storage Hierarchy, 2009.
23
24
25
 
26
 
27


Collaborative Colleagues:
Dimitris Tsirogiannis: colleagues
Stavros Harizopoulos: colleagues
Mehul A. Shah: colleagues
Janet L. Wiener: colleagues
Goetz Graefe: colleagues