| Threaded multiple path execution |
| Full text |
Pdf
(1.49 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 25th annual international symposium on Computer architecture
table of contents
Barcelona, Spain
Pages: 238 - 249
Year of Publication: 1998
ISBN:0-8186-8491-7
Also published in ...
|
|
Authors
|
|
Steven Wallace
|
Department of Computer Science and Engineering, University of California, San Diego
|
|
Brad Calder
|
Department of Computer Science and Engineering, University of California, San Diego
|
|
Dean M. Tullsen
|
Department of Computer Science and Engineering, University of California, San Diego
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 23, Citation Count: 31
|
|
|
ABSTRACT
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multi-threading (SMT) processor to speculatively execute multiple paths of execution. When there are fewer threads in an SMT processor than hardware contexts, threaded multi-path execution uses spare contexts to fetch and execute code along the less likely path of hard-to-predict branches.This paper describes the hardware mechanisms needed to enable an SMT processor to efficiently spawn speculative threads for threaded multi-path execution. The Mapping Synchronization Bus is described, which enables the spawning of these multiple paths. Policies are examined for deciding which branches to fork, and for managing competition between primary and alternate path threads for critical resources. Our results show that TME increases the single program performance of an SMT with eight thread contexts by 14%-23% on average, depending on the misprediction penalty for programs with a high misprediction rate.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Calder, D. Grunwald, and B. Zorn. Quantifying behavioral differences between C and C++ programs. Journal of Programming Languages, 2(4), 1994.
|
| |
2
|
T.H. Heft and J.E. Smith. Selective dual path execution. Technical report, University of Wisconsin - Madison, November 1996. http://www, ece.wisc, edu/-jes/papers/sdpe, ps.
|
| |
3
|
|
 |
4
|
|
| |
5
|
S. McFarling. Combining branch predictors. Technical Report TN-36, DEC-WRL, June 1993.
|
| |
6
|
D.M. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In 22nd Annual Computer Measurement Group Conference, December 1996.
|
 |
7
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
8
|
|
| |
9
|
G. Tyson, K. Lick, and M. Farrens. Limited dual path execution. Technical Report CSE-TR 346-97, University of Michigan, 1997.
|
| |
10
|
Augustus K. Uht , Vijay Sindagi , Kelley Hall, Disjoint eager execution: an optimal form of speculative execution, Proceedings of the 28th annual international symposium on Microarchitecture, p.313-325, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
11
|
S. Wallace, B. Calder, and D.M. Tullsen. Threaded multiple path execution. Technical Report CS97-551, University of California, San Diego, 1997.
|
| |
12
|
|
 |
13
|
|
CITED BY 31
|
|
|
|
|
|
|
|
Juan L. Aragón , José González , Antonio González , James E. Smith, Dual path instruction processing, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, ACM SIGARCH Computer Architecture News, v.29 n.2, p.14-25, May 2001
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Improving prediction for procedure returns with return-address-stack repair mechanisms, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.259-271, November 1998, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
|
|
|
Christos D. Antonopoulos , Xiaoning Ding , Andrey Chernikov , Filip Blagojevic , Dimitrios S. Nikolopoulos , Nikos Chrisochoides, Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christos D. Antonopoulos , Filip Blagojevic , Andrey N. Chernikov , Nikos P. Chrisochoides , Dimitrios S. Nikolopoulos, Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures, Journal of Parallel and Distributed Computing, v.69 n.7, p.601-612, July, 2009
|
|
|
|
|