ACM Home Page
Please provide us with feedback. Feedback
Call graph prefetching for database applications
Full text PdfPdf (702 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 21 ,  Issue 4  (November 2003) table of contents
Pages: 412 - 444  
Year of Publication: 2003
ISSN:0734-2071
Authors
Murali Annavaram  Intel Corporation, Santa Clara, CA
Jignesh M. Patel  The University of Michigan, Ann Arbor, MI
Edward S. Davidson  The University of Michigan, Ann Arbor, MI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 66,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/945506.945509
What is a DOI?

ABSTRACT

With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing speed of the CPU and the memory access latency. Previous work has shown that database applications have large instruction and data footprints and hence do not use processor caches effectively. In this paper, we propose Call Graph Prefetching (CGP), an N instruction prefetching technique that analyzes the call graph of a database system and prefetches instructions from the function that is deemed likely to be called next. CGP capitalizes on the highly predictable function call sequences that are typical of database systems. CGP can be implemented either in software or in hardware. The software-based CGP (CGP_S) uses profile information to build a call graph, and uses the predictable call sequences in the call graph to determine which function to prefetch next. The hardware-based CGP(CGP_H) uses a hardware table, called the Call Graph History Cache (CGHC), to dynamically store sequences of functions invoked during program execution, and uses that stored history when choosing which functions to prefetch.We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries, as well as on CPU-2000 benchmarks. For most CPU-2000 applications the number of instruction cache (I-cache) misses were very few even without any prefetching, obviating the need for CGP. On the other hand, the database workloads do suffer a significant number of I-cache misses; CGP_S improves their performance by 23% and CGP_H by 26% over a baseline system that has already been highly tuned for efficient I-cache usage by using the OM tool. CGP, with or without OM, reduces the I-cache miss stall time by about 50% relative to O5+OM, taking us about half way from an already highly tuned baseline system toward perfect I-cache performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
5
 
6
 
7
 
8
Burger, D. and Austin, T. 1997. The SimpleScalar Tool Set. Tech. Rep. 1342, University of Wisconsin-Madison, Computer ScienceDepartment. June.
9
 
10
11
12
 
13
 
14
15
 
16
 
17
Intel Web Site: http://www.developer.intel.com/drg/mmx/appnotes/perfmon.htm Survey of Pentium Processor Performance Monitoring Capabilities & Tools.
18
19
 
20
21
22
23
24
25
 
26
 
27
 
28
Romer, T., Voelker, G., Lee, D., Wolman, A., Wong, W., Levy, H., Bershad, B., and Chen, B. 1997. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In USENIX Windows NT Workshop. 1--7.
 
29
Rupley, J., Annavaram, M., Devale, J., Diep, T., and Black, B. 2002. Comparing and Contrasting a Commercial OLTP Workload with CPU2000 on IPF. In the 5th Workshop on Workload Characterization.
 
30
 
31
Smith, A. 1978. Sequential Program Prefetching in Memory Hierarchies. IEEE Comput. 11, 2 (December), 7--21.
 
32
 
33
 
34
Srivastava, A. and Eustace, A. 1994. ATOM: A System for Building Customized Program Analysis Tools. Tech. Rep. 94/2, Digital Western Research Laboratory. March.
 
35
Srivastava, A. and Wall, D. 1992. A Practical System for Intermodule Code Optimization at Link-Time. Tech. Rep. 92/6, Digital Western Research Laboratory. June.
 
36
 
37
TPC. 1999. TPC Benchmark H Standard Specification (Decision Support). In Revision 1.1.0.


Collaborative Colleagues:
Murali Annavaram: colleagues
Jignesh M. Patel: colleagues
Edward S. Davidson: colleagues