|
ABSTRACT
This paper performs a comprehensive investigation of dynamic selection for long atomic traces. It introduces a classification of trace selection methods and discusses existing and novel dynamic selection approaches - including loop unrolling, procedure in-lining and incremental merging of traces based on dynamic bias. The paper empirically analyzes a number of selection schemes in an idealized framework.Observations based on the SPEC-CPU2000 benchmarks show that: (a) selection based on dynamic bias is necessary to achieve the best performance across all benchmarks, (b) the best selection scheme is benchmark and maximum trace-length specific, (c) simple selection, based on program structure information only, is sufficient to achieve the best performance for several benchmarks.Consequently, two alternatives for the trace selection mechanism are established: (a) a "best performance" approach relying on complex dynamic criteria; (b) a "value" approach that provides the best performance (and potentially the best power consumption) based on simpler static criteria. Another emerging alternative advocates adaptive based mechanisms to adjust selection criteria.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G.M. Amdahl, "Validity of the single-processor approach to achieving large scale computing capabilities", in AFIPS vol. 30, pp. 483--485, 1967.
|
| |
2
|
V. Bala, E. Duesterwald and S. Banerjia, "Transparent Dynamic Optimization: The Design and Implementation of Dynamo", TR HPL-1999-78, HP Labs.
|
 |
3
|
Bryan Black , Bohuslav Rychlik , John Paul Shen, The block-based trace cache, Proceedings of the 26th annual international symposium on Computer architecture, p.196-207, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
4
|
|
 |
5
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
6
|
|
| |
7
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
8
|
J.A. Fisher, "Trace Scheduling: A technique for Global Microcode Compaction", in IEEE Transactions on Computers, 30(7), pp. 478--490, July 1981.
|
| |
9
|
Daniel Holmes Friendly , Sanjay Jeram Patel , Yale N. Patt, Alternative fetch and issue policies for the trace cache fetch mechanism, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.24-33, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
10
|
|
| |
11
|
|
| |
12
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker and P. Roussel, "The Microarchitecture of the Pentium® 4 Processor", in Intel Technology Journal, 2001.
|
| |
13
|
Quinn Jacobson , Eric Rotenberg , James E. Smith, Path-based next trace prediction, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.14-23, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
14
|
|
 |
15
|
|
| |
16
|
S. Jourdan, L. Rappoport, Y. Almog, M. Erez, A. Yoaz, and R. Ronen, "eXtended Block Cache", in HPCA6, Jan. 2000.
|
 |
17
|
|
 |
18
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
 |
19
|
|
| |
20
|
|
 |
21
|
Matthew C. Merten , Andrew R. Trick , Christopher N. George , John C. Gyllenhaal , Wen-mei W. Hwu, A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization, Proceedings of the 26th annual international symposium on Computer architecture, p.136-147, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
22
|
M.C. Merten, A.R. Trick, E. M. Nystrom, R.D. Barnes and W. Mwu, "A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots", in ISCA27, May 2000.
|
| |
23
|
S. Patel, D. Friendly and Y. Patt, "Critical Issues Regarding the Trace Cache Fetch Mechanism", Univ. of Michigan Technical Report CSE-TR-335- 97.
|
 |
24
|
|
| |
25
|
|
 |
26
|
Sanjay J. Patel , Tony Tung , Satarupa Bose , Matthew M. Crum, Increasing the size of atomic instruction blocks using control flow assertions, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.303-313, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360160]
|
| |
27
|
A. Peleg and U. Weiser. "Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line", U.S. Patent 5,381,533, Jan. 1995.
|
| |
28
|
M. Postiff, G. Tyson and T. Mudge, "Performance Limits of Trace Caches", in Journal of ILP, vol. 1, Oct. 1999.
|
 |
29
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305178]
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
34
|
|
 |
35
|
Baruch Solomon , Avi Mendelson , Doron Orenstein , Yoav Almog , Ronny Ronen, Micro-operation cache: a power aware frontend for the variable instruction length ISA, Proceedings of the 2001 international symposium on Low power electronics and design, p.4-9, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383085]
|
| |
36
|
Brian Slechta , David Crowe , Brian Fahs , Michael Fertig , Gregory Muthler , Justin Quek , Francesco Spadini , Sanjay J. Patel , Steven S. Lumetta, Dynamic Optimization of Micro-Operations, Proceedings of the 9th International Symposium on High-Performance Computer Architecture, p.165, February 08-12, 2003
|
CITED BY 4
|
|
Yoav Almog , Roni Rosner , Naftali Schwartz , Ari Schmorak, Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.137, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
Juan C. Moure , Domingo Benítez , Dolores I. Rexachs , Emilio Luque, Wide and efficient trace prediction using the local trace predictor, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|