ACM Home Page
Please provide us with feedback. Feedback
A data warehouse appliance for the mass market
Full text PdfPdf (356 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Industrial session 2: exploiting new hardware table of contents
Pages 879-880  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Author
Ravi Krishnamurthy  Kickfire, Santa Clara, , CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 182,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559939
What is a DOI?

ABSTRACT

Vast majority of the data warehouses have less than few terabytes of data and their performance for complex queries on traditional database systems are often not very satisfactory. Data warehouse appliances have been announced by vendors (HP Oracle Exadata Storage server, HP Neoview, Neteeza etc.) to address this burgeoning need. Most of these involve creating a large parallel database systems using scale-out of commodity machines and/or pushing filters into disk retrieval system to reduce the data coming to memory; these done along the lines pioneered by research projects such as Gamma, Bubba and other prior database machine research. These approaches deliver performance by deploying many CPUs, large amount of memory, large number of disk-heads & disk space and in effect extracting performance by under utilizing the resources -- albeit very inexpensive commodity resources.

In contrast we propose a database system in a box (i.e., a single system) that can deliver high performance for complex queries while utilizing much less resources (memory, disks etc.); i.e., better resource utilization and therefore lower cost. This approach consists of using column store (pioneered in the Bubba project) which has the effect of 1) reducing the need for large number of disk heads (i.e., I/O bandwidth); and 2) reducing the need for large amount of memory for achieving memory-resident query execution. Having mitigated the disk I/O problem using column store & memory, the Von Neumann bottleneck becomes the force majeure. This problem has been pursued by database researchers in the context of cache-conscious query execution. Unfortunately, traditional CPUs provide limited control to "page" the data into the cache and retain it there to leverage the cache effectively.

Our approach is to leverage a custom dataflow machine that can be coupled with a large memory and thereby practically eliminating the Von Neumann bottleneck. Besides mitigating this bottleneck, the exploitation of fine-grained pipelined and operator parallelism in hardware provides significant performance improvement. This results in a low-cost high-performance database appliance for vast majority of the data warehouse market. Kickfire has shown that such an appliance can deliver both price/performance and raw performance as compared to the competitive approaches. Note that this high performance appliance does not preclude leveraging scale-out; i.e., it can itself be used to scale-out to a much larger database in the future.