ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Hardware Support for Control Transfers in Code Caches
Full text PdfPdf (316 KB)
Source International Symposium on Microarchitecture archive
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture table of contents
Page: 253  
Year of Publication: 2003
ISBN:0-7695-2043-X
Authors
Ho-Seop Kim  Department of Electrical and Computer Engineering, University of Wisconsin - Madison
James E. Smith  Department of Electrical and Computer Engineering, University of Wisconsin - Madison
Sponsor
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 24,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Many dynamic optimization and/or binary translationsystems hold optimized/translated superblocks in a codecache. Conventional code caching systems suffer fromoverheads when control is transferred from one cachedsuperblock to another, especially via register-indirectjumps. The basic problem is that instruction addresses inthe code cache are different from those in the original programbinary. Therefore, performance for register-indirectjumps depends on the ability to translate efficiently fromsource binary PC values to code cache PC values.We analyze several key aspects of superblock chainingand find that a conventional baseline code cache withsoftware jump target prediction results in 14.6% IPC lossversus the original binary. We identify the inability to usea conventional return address stack as the most significantperformance limiter in code cache systems. We introduce amodified software prediction technique that reduces theIPC loss to 11.4%. This technique is based on a techniqueused in threaded code interpreters.A number of hardware mechanisms, including a specializedreturn address stack and a hardware cache fortranslated jump target addresses, are studied for efficientlysupporting register-indirect jumps. Once all the chainingoverheads are removed by these support mechanisms, asuperblock-based code cache improves performance due toa better branch prediction rate, improved I-cache locality,and increased chances of straight-line fetches. Simulationresults show a 7.7% IPC improvement over a current generation4-way superscalar processor.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[1] Erik R. Altman, Michael Gschwind, Sumedh Sathaye, S. Kosonocky, Arthur Bright, Jason Fritts, Paul Ledak, David Appenzeller, Craig Agricola, Zachary Filan, "BOA: The Architecture of a Binary Translation Processor," IBM Research Report RC21665, Dec. 2000.
2
 
3
 
4
[4] Derek Bruening, Evelyn Duesterwald, Saman Amarasinghe, "Design and Implementation of a Dynamic Optimization Framework for Windows," The 4th Workshop on Feedback-Directed and Dynamic Optimization, Dec. 2001.
 
5
 
6
[6] Douglas C. Burger, Todd M. Austin, "The SimpleScalar Toolset, Version 2.0" Technical Report CS-TR-97-1342, University of Wisconsin-Madison, Jun. 1997.
 
7
[7] Wen-Ke Chen, Sorin Lerner, Ronnie Chaiken, David M. Gillies, "Mojo: A Dynamic Optimization System," The 3rd Workshop on Feedback-Directed and Dynamic Optimization , Dec. 2000.
 
8
[8] Dean Deaver, Rick Gorton, Norman Rubin, "Wiggins/Redstone: An Online Program Specializer," The 11th HotChips Symposim, Jun. 1999.
 
9
 
10
 
11
 
12
 
13
[13] Michael Gschwind, "Method and Apparatus for Determining Branch Addresses in Programs Generated by Binary Translation, IBM Disclosures YOR819980334, Jul. 1998.
 
14
[14] Michael Gschwind, "Method and Apparatus for Rapid Return Address Computation in Binary Translation," IBM Disclosures YOR819980410, Sep. 1998.
 
15
[15] Tom R. Halfhill, "Transmeta Breaks x86 Low-Power Barrier," Microprocessor Report, Feb. 14, 2000.
 
16
[16] Hewlett Packard Co., "PA-RISC 8×00 Family of Microprocessors with Focus on PA-8700," www.cpus.hp.com/technical_references/PA-8700wp.pdf.
 
17
[17] Glenn Hinton, Dave Sager, Mike Upton, Darrel Boggs, Doug Carmean, Alan Kyker, Patrice Roussel, "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal Q1, 2001.
 
18
 
19
20
21
 
22
[22] Edmund J. Kelly, Robert F. Cmelik, Malcolm J. Wing, "Memory Controller for a Microprocessor for Detecting a Failure of Speculation on the Physical Nature of a Component Being Addressed," US Patent 5,832,205, Nov. 1998.
 
23
Chetana N. Keltcher , Kevin J. McGrath , Ardsher Ahmed , Pat Conway, The AMD Opteron Processor for Multiprocessor Servers, IEEE Micro, v.23 n.02, p.66-76, March 2003
 
24
25
 
26
27
 
28
 
29
 
30
 
31
 
32
[32] Sun Microsystems, "UltraSPARC IIIi Processor," www.sun.com/processors/UltraSPARC-IIIi/us3i_datasheet.pdf, 2003.
 
33
[33] Joel M. Tendler, Steve Dodson, Steve Fields, Hung Le, Balaram Sinharoy, "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, pp. 5-26, Jan. 2002.
 
34
[34] David Ung, Cristina Cifuentes, "Optimizing Hot Paths in a Dynamic Binary Translator," The 2nd Workshop on Binary Translation, Oct. 2000.
35
36
 
37


Collaborative Colleagues:
Ho-Seop Kim: colleagues
James E. Smith: colleagues