| Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor |
| Full text |
Pdf
(1.30 MB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Speculative threading and parallelization
table of contents
Pages 484-495
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Shailender Chaudhry
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Robert Cypher
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Magnus Ekman
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Martin Karlsson
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Anders Landin
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Sherman Yip
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Håkan Zeffer
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
Marc Tremblay
|
Sun Microsystems, Inc., Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 102, Downloads (12 Months): 377, Citation Count: 0
|
|
|
ABSTRACT
This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions that are independent of the load miss) and executes them in parallel. SST uses an efficient checkpointing mechanism to eliminate the need for complex and power-inefficient structures such as register renaming logic, reorder buffers, memory disambiguation buffers, and large issue windows. Simulations of certain SST implementations show 18% better per-thread performance on commercial benchmarks than larger and higher-powered out-of-order cores. Sun Microsystems' ROCK processor, which is the first processor to use SST cores, has been implemented and is scheduled to be commercially available in 2009.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Ronald D. Barnes , Erik M. Nystrom , John W. Sias , Sanjay J. Patel , Nacho Navarro , Wen-mei W. Hwu, Beating in-order stalls with "flea-flicker" two-pass pipelining, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.387, December 03-05, 2003
|
 |
3
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
4
|
|
| |
5
|
Shailender Chaudhry , Robert Cypher , Magnus Ekman , Martin Karlsson , Anders Landin , Sherman Yip , Håkan Zeffer , Marc Tremblay, Rock: A High-Performance Sparc CMT Processor, IEEE Micro, v.29 n.2, p.6-16, March 2009
[doi> 10.1109/MM.2009.34]
|
| |
6
|
Adrian Cristal , Oliverio J. Santana , Francisco Cazorla , Marco Galluzzi , Tanausu Ramirez , Miquel Pericas , Mateo Valero, Kilo-Instruction Processors: Overcoming the Memory Wall, IEEE Micro, v.25 n.3, p.48-57, May 2005
[doi> 10.1109/MM.2005.53]
|
 |
7
|
|
 |
8
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
15
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
]]Rundberg, P., and Stenström, P. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. Journal of Instruction-Level Parallelism 3, 1 (2001), 2002.
|
 |
22
|
|
 |
23
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
24
|
|
| |
25
|
]]Tremblay, M., and Chaudhry, S. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor. In Proceedings of the 2008 International Solid-State Circuits Conference (Feb. 2008), pp. 82--83.
|
| |
26
|
]]Wenisch, T., Wunderlich, R., Falsafi, B., and Hoe, J. Simulation Sampling with Live-Points. In Proceedings of the 2006 IEEE International Symposium on Performance Analysis of System and Software (Mar. 2006), pp. 2--12.
|
|