|
ABSTRACT
Many dynamic optimization and/or binary translationsystems hold optimized/translated superblocks in a codecache. Conventional code caching systems suffer fromoverheads when control is transferred from one cachedsuperblock to another, especially via register-indirectjumps. The basic problem is that instruction addresses inthe code cache are different from those in the original programbinary. Therefore, performance for register-indirectjumps depends on the ability to translate efficiently fromsource binary PC values to code cache PC values.We analyze several key aspects of superblock chainingand find that a conventional baseline code cache withsoftware jump target prediction results in 14.6% IPC lossversus the original binary. We identify the inability to usea conventional return address stack as the most significantperformance limiter in code cache systems. We introduce amodified software prediction technique that reduces theIPC loss to 11.4%. This technique is based on a techniqueused in threaded code interpreters.A number of hardware mechanisms, including a specializedreturn address stack and a hardware cache fortranslated jump target addresses, are studied for efficientlysupporting register-indirect jumps. Once all the chainingoverheads are removed by these support mechanisms, asuperblock-based code cache improves performance due toa better branch prediction rate, improved I-cache locality,and increased chances of straight-line fetches. Simulationresults show a 7.7% IPC improvement over a current generation4-way superscalar processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] Erik R. Altman, Michael Gschwind, Sumedh Sathaye, S. Kosonocky, Arthur Bright, Jason Fritts, Paul Ledak, David Appenzeller, Craig Agricola, Zachary Filan, "BOA: The Architecture of a Binary Translation Processor," IBM Research Report RC21665, Dec. 2000.
|
 |
2
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
3
|
|
| |
4
|
[4] Derek Bruening, Evelyn Duesterwald, Saman Amarasinghe, "Design and Implementation of a Dynamic Optimization Framework for Windows," The 4th Workshop on Feedback-Directed and Dynamic Optimization, Dec. 2001.
|
| |
5
|
|
| |
6
|
[6] Douglas C. Burger, Todd M. Austin, "The SimpleScalar Toolset, Version 2.0" Technical Report CS-TR-97-1342, University of Wisconsin-Madison, Jun. 1997.
|
| |
7
|
[7] Wen-Ke Chen, Sorin Lerner, Ronnie Chaiken, David M. Gillies, "Mojo: A Dynamic Optimization System," The 3rd Workshop on Feedback-Directed and Dynamic Optimization , Dec. 2000.
|
| |
8
|
[8] Dean Deaver, Rick Gorton, Norman Rubin, "Wiggins/Redstone: An Online Program Specializer," The 11th HotChips Symposim, Jun. 1999.
|
| |
9
|
Giuseppe Desoli , Nikolay Mateev , Evelyn Duesterwald , Paolo Faraboschi , Joseph A. Fisher, DELI: a new run-time control point, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
10
|
|
| |
11
|
|
| |
12
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
13
|
[13] Michael Gschwind, "Method and Apparatus for Determining Branch Addresses in Programs Generated by Binary Translation, IBM Disclosures YOR819980334, Jul. 1998.
|
| |
14
|
[14] Michael Gschwind, "Method and Apparatus for Rapid Return Address Computation in Binary Translation," IBM Disclosures YOR819980410, Sep. 1998.
|
| |
15
|
[15] Tom R. Halfhill, "Transmeta Breaks x86 Low-Power Barrier," Microprocessor Report, Feb. 14, 2000.
|
| |
16
|
[16] Hewlett Packard Co., "PA-RISC 8×00 Family of Microprocessors with Focus on PA-8700," www.cpus.hp.com/technical_references/PA-8700wp.pdf.
|
| |
17
|
[17] Glenn Hinton, Dave Sager, Mike Upton, Darrel Boggs, Doug Carmean, Alan Kyker, Patrice Roussel, "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal Q1, 2001.
|
| |
18
|
|
| |
19
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
20
|
|
 |
21
|
|
| |
22
|
[22] Edmund J. Kelly, Robert F. Cmelik, Malcolm J. Wing, "Memory Controller for a Microprocessor for Detecting a Failure of Speculation on the Physical Nature of a Component Being Addressed," US Patent 5,832,205, Nov. 1998.
|
| |
23
|
Chetana N. Keltcher , Kevin J. McGrath , Ardsher Ahmed , Pat Conway, The AMD Opteron Processor for Multiprocessor Servers, IEEE Micro, v.23 n.02, p.66-76, March 2003
|
| |
24
|
|
 |
25
|
|
| |
26
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
 |
27
|
Matthew C. Merten , Andrew R. Trick , Erik M. Nystrom , Ronald D. Barnes , Wen-mei W. Hmu, A hardware mechanism for dynamic extraction and relayout of program hot spots, Proceedings of the 27th annual international symposium on Computer architecture, p.59-70, June 2000, Vancouver, British Columbia, Canada
|
| |
28
|
|
| |
29
|
|
| |
30
|
K. Scott , N. Kumar , S. Velusamy , B. Childers , J. W. Davidson , M. L. Soffa, Retargetable and reconfigurable software dynamic translation, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
| |
31
|
|
| |
32
|
[32] Sun Microsystems, "UltraSPARC IIIi Processor," www.sun.com/processors/UltraSPARC-IIIi/us3i_datasheet.pdf, 2003.
|
| |
33
|
[33] Joel M. Tendler, Steve Dodson, Steve Fields, Hung Le, Balaram Sinharoy, "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, pp. 5-26, Jan. 2002.
|
| |
34
|
[34] David Ung, Cristina Cifuentes, "Optimizing Hot Paths in a Dynamic Binary Translator," The 2nd Workshop on Binary Translation, Oct. 2000.
|
 |
35
|
John Whaley, Partial method compilation using dynamic profile information, Proceedings of the 16th ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, p.166-179, October 14-18, 2001, Tampa Bay, FL, USA
|
 |
36
|
|
| |
37
|
|
CITED BY 3
|
|
Jason D. Hiser , Daniel Williams , Wei Hu , Jack W. Davidson , Jason Mars , Bruce R. Childers, Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems, Proceedings of the International Symposium on Code Generation and Optimization, p.61-73, March 11-14, 2007
|
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, ACM SIGPLAN Notices, v.40 n.6, June 2005
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|