ACM Home Page
Please provide us with feedback. Feedback
Code layout optimizations for transaction processing workloads
Full text PdfPdf (747 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 28th annual international symposium on Computer architecture table of contents
Göteborg, Sweden
Pages: 155 - 164  
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
Authors
Alex Ramirez  Computer Architecture Department, Universitat Politecnica de Catalunya
Luiz André Barroso  Western Research Laboratory, Compaq Computer Corporation
Kourosh Gharachorloo  Western Research Laboratory, Compaq Computer Corporation
Robert Cohn  Alpha Development Group, Compaq Computer Corporation
Josep Larriba-Pey  Computer Architecture Department, Universitat Politecnica de Catalunya
P. Geoffrey Lowney  Alpha Development Group, Compaq Computer Corporation
Mateo Valero  Computer Architecture Department, Universitat Politecnica de Catalunya
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA : TC on Computer Arhitecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 40,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379240.379260
What is a DOI?

ABSTRACT

Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, on-line transaction processing (OLTP) workloads provide a challenging set of requirements for system designs since they often exhibit inefficient executions dominated by a large memory stall component. This behavior arises from large instruction and data footprints and high communication miss rates. A number of recent studies have characterized the behavior of commercial workloads and proposed architectural features to improve their performance. However, there has been little research on the impact of software and compiler-level optimizations for improving the behavior of such workloads.

This paper provides a detailed study of profile-driven compiler optimizations to improve the code layout in commercial workloads with large instruction footprints. Our compiler algorithms are implemented in the context of Spike, an executable optimizer for the Alpha architecture. Our experiments use the Oracle commercial database engine running an OLTP workload, with results generated using both full system simulations and actual runs on Alpha multiprocessors. Our results show that code layout optimizations can provide a major improvement in the instruction cache behavior, providing a 55% to 65% reduction in the application misses for 64-128K caches. Our analysis shows that this improvement primarily arises from longer sequences of consecutively executed instructions and more reuse of cache lines before they are replaced. We also show that the majority of application instruction misses are caused by self-interference. However, code layout optimizations significantly reduce the amount of self-interference, thus elevating the relative importance of interference with operating system code. Finally, we show that better code layout can also provide substantial improvements in the behavior of other memory system components such as the instruction TLB and the unified second-level cache. The overall performance impact of our code layout optimizations is an improvement of 1.33 times in the execution time of our workload.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
L. A. Barroso, K. Gharachorloo, A. Nowatzyk, and B. Verghese. Impact of Chip-Level Integration on Performance of OLTP Workloads. In Proceedings of the 6th International Symposium on High Performance Comp,tter Architecture, January 2000.
 
5
6
 
7
Z. Cvetanovic and D. D. Donaldson. AlphaServer 4100 performance characterization. Digital Technical Journal, 8(4):3-20, 1996.
8
 
9
 
10
D. J. Hartfield and J. Gerald. Program restructuring for virtual memory. IBM Systems Journal, 2:169-192, 1971.
 
11
12
13
 
14
15
16
17
18
19
20
 
21
22
23
24
25
 
26
 
27
A. Srivastava and D. W. Wall. A practical system for intermodule code optimization at link-time. Journal of Programming Languages, 1(1):1-18, Dec. 1992.
 
28
 
29
 
30
Transaction Processing Performance Council. TPC Benchmark B (Online Transaction Processing) Standard Specification, 1990.

CITED BY  14

Collaborative Colleagues:
Alex Ramirez: colleagues
Luiz André Barroso: colleagues
Kourosh Gharachorloo: colleagues
Robert Cohn: colleagues
Josep Larriba-Pey: colleagues
P. Geoffrey Lowney: colleagues
Mateo Valero: colleagues