|
ABSTRACT
As more and more query processing work can be done in main memory access is becoming a significant cost component of database operations. Recent database research has shown that most of the memory stalls are due to second-level cache data misses and first-level instruction cache misses. While a lot of research has focused on reducing the data cache misses, relatively little research has been done on improving the instruction cache performance of database systems.We first answer the question "Why does a database system incur so many instruction cache misses?" We demonstrate that current demand-pull pipelined query execution engines suffer from significant instruction cache thrashing between different operators. We propose techniques to buffer database operations during query execution to avoid instruction cache thrashing. We implement a new light-weight "buffer" operator and study various factors which may affect the cache performance. We also introduce a plan refinement algorithm that considers the query plan and decides whether it is beneficial to add additional "buffer" operators and where to put them. The benefit is mainly from better instruction locality and better hardware branch prediction. Our techniques can be easily integrated into current database systems without significant changes. Our experiments in a memory-resident PostgreSQL database system show that buffering techniques can reduce the number of instruction cache misses by up to 80% and improve query performance by up to 15%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
LMbench - Tools for Performance Analysis. http://www.bitmover.com/lmbench/.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
I.-C. K. Chen, C.-C. Lee, and T. N. Mudge. Instruction prefetching using branch prediction information. In Proceedings of International Symposium on Microarchitecture, 1997.
|
 |
8
|
Shimin Chen , Phillip B. Gibbons , Todd C. Mowry, Improving index performance through prefetching, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.235-246, May 21-24, 2001, Santa Barbara, California, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
G. Graefe. Volcano, an extensible and parallel query evaluation system. IEEE Transactions on knowledge and data enginnering, 6(6):934--944, 1994.
|
| |
12
|
L. M. Haas , W. Chang , G. M. Lohman , J. McPherson , P. F. Wilms , G. Lapis , B. Lindsay , H. Pirahesh , M. J. Carey , E. Shekita, Starburst Mid-Flight: As the Dust Clears, IEEE Transactions on Knowledge and Data Engineering, v.2 n.1, p.143-160, March 1990
[doi> 10.1109/69.50910]
|
 |
13
|
Amir H. Hashemi , David R. Kaeli , Brad Calder, Efficient procedure mapping using cache line coloring, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.171-182, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
14
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, 1st Quarter, 2001.
|
| |
15
|
Intel Corp. VTune performance analyzer. http://www.intel.com/software/products/vtune/.
|
| |
16
|
Intel Inc. IA32 intel architecture optimization reference manual. 2003.
|
 |
17
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain
|
 |
18
|
Kihong Kim , Sang K. Cha , Keunjoo Kwon, Optimizing multidimensional index trees for main memory access, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.139-150, May 21-24, 2001, Santa Barbara, California, United States
|
| |
19
|
|
 |
20
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, United States
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Alex Ramirez , Josep Ll. Larriba-Pey , Carlos Navarro , Xavi Serrano , Mateo Valero , Josep Torrellas, Optimization of Instruction Fetch for Decision Support Workloads, Proceedings of the 1999 International Conference on Parallel Processing, p.238, September 21-24, 1999
|
 |
25
|
Alex Ramirez , Luiz André Barroso , Kourosh Gharachorloo , Robert Cohn , Josep Larriba-Pey , P. Geoffrey Lowney , Mateo Valero, Code layout optimizations for transaction processing workloads, Proceedings of the 28th annual international symposium on Computer architecture, p.155-164, June 30-July 04, 2001, Göteborg, Sweden
|
| |
26
|
|
 |
27
|
|
 |
28
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
Transaction Processing Performance Council. TPC Benchmark H. Available via http://www.tpc.com/tpch/.
|
| |
33
|
J. Zhou and K. A. Ross. Buffering accesses to memory-resident index structures. In Proceedings of VLDB Conference, 2003.
|
CITED BY 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ryan Johnson , Stavros Harizopoulos , Nikos Hardavellas , Kivanc Sabirli , Ippokratis Pandis , Anastasia Ailamaki , Naju G. Mancheril , Babak Falsafi, To share or not to share?, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|