|
ABSTRACT
This paper presents CHeckpointed Early Resource RecYcling (Cherry), a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. Cherry leverages the ROB to (1) not require in-order execution as a fallback mechanism, (2) allow memory replay traps and branch mispredictions without rolling back to the Cherry checkpoint, and (3) quickly fall back to conventional out-of-order execution without rolling back to the checkpoint or flushing the pipeline.We present a Cherry implementation with early recycling at three different points of the execution engine: the load queue, the store queue, and the register file. We report average speedups of 1.06 and 1.26 in SPECint and SPECfp applications, respectively, relative to an aggressive conventional architecture. We also describe how Cherry and speculative multithreading can be combined and complement each other.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Compaq Computer Corporation. Alpha 21264/EV67 Microprocessor Hardware Reference Manual, Shrewsbury, MA, September 2000.
|
| |
2
|
A. Cristal, M. Valero, J.-L. Llosa, and A. González. Large virtual ROBs by processor checkpointing. Technical Report UPC-DAC-2002-39, Universitat Politècnica de Catalunya, July 2002.
|
 |
3
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
 |
4
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
5
|
|
| |
6
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Q1 2001.
|
 |
7
|
|
| |
8
|
A. KleinOsowski, J. Flynn, N. Meares, and D. Lilja. Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research. In Workshop on Workload Characterization, Austin, TX, September 2000.
|
| |
9
|
|
 |
10
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
| |
11
|
|
| |
12
|
|
| |
13
|
R. Manohar. Personal communication, August 2002.
|
 |
14
|
|
| |
15
|
Milo M. Martin , Amir Roth , Charles N. Fischer, Exploiting dead value information, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.125-135, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
16
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
| |
17
|
Mayan Moudgill , Keshav Pingali , Stamatis Vassiliadis, Register renaming and dynamic speculation: an alternative approach, Proceedings of the 26th annual international symposium on Microarchitecture, p.202-213, December 01-03, 1993, Austin, Texas, United States
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--25, January 2002.
|
| |
22
|
|
| |
23
|
|
 |
24
|
Javier Zalamea , Josep Llosa , Eduard Ayguadé , Mateo Valero, Two-level hierarchical register file organization for VLIW processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.137-146, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360143]
|
CITED BY 54
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Adrian Cristal , Oliverio J. Santana , Francisco Cazorla , Marco Galluzzi , Tanausu Ramirez , Miquel Pericas , Mateo Valero, Kilo-Instruction Processors: Overcoming the Memory Wall, IEEE Micro, v.25 n.3, p.48-57, May 2005
|
|
|
|
|
|
|
|
|
Oguz Ergin , Deniz Balkan , Kanad Ghose , Dmitry Ponomarev, Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.304-315, December 04-08, 2004, Portland, Oregon
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, Selective writeback: exploiting transient values for energy-efficiency and performance, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shailender Chaudhry , Robert Cypher , Magnus Ekman , Martin Karlsson , Anders Landin , Sherman Yip , Håkan Zeffer , Marc Tremblay, Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Mojtaba Mehrara , Mona Attariyan , Smitha Shyam , Kypros Constantinides , Valeria Bertacco , Todd Austin, Low-cost protection for SER upsets and silicon defects, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vijay Nagarajan , Rajiv Gupta, Support for symmetric shadow memory in multiprocessors, Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging, p.1-9, July 20-21, 2008, Seattle, Washington
|
|
|
Fernando Latorre , Grigorios Magklis , José González , Pedro Chaparro , Antonio González, Building a large instruction window through ROB compression, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.41-48, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
Haitham Akkary , Komal Jothi , Renjith Retnamma , Satyanarayana Nekkalapu , Doug Hall , Shahrokh Shahidzadeh, On the potential of latency tolerant execution in speculative multithreading, Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, November 24-25, 2008, Cairo, Egypt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Isidro Gonzalez , Marco Galluzzi , Alex Veidenbaum , Marco A. Ramirez , Adrian Cristal , Mateo Valero, A distributed processor state management architecture for large-window processors, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.11-22, November 08-12, 2008
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Madriles , Pedro López , Josep M. Codina , Enric Gibert , Fernando Latorre , Alejandro Martinez , Raúl Martinez , Antonio Gonzalez, Boosting single-thread performance in multi-core systems through fine-grain multi-threading, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
REVIEW
"Ronaldo A. L. Goncalves : Reviewer"
A proposal to combine previous techniques to make better use of resources (registers) in superscalar processors, based on early recycling, is presented in this paper. This new technique was tested on both a load/store queue and reorder buffer.
more...
|