| Simultaneous subordinate microthreading (SSMT) |
| Full text |
Pdf
(130 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 26th annual international symposium on Computer architecture
table of contents
Atlanta, Georgia, United States
Pages: 186 - 195
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
|
|
Authors
|
|
Robert S. Chappell
|
EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan
|
|
Jared Stark
|
EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan
|
|
Sangwook P. Kim
|
Intel Corporation, Santa Clara, CA
|
|
Steven K. Reinhardt
|
EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan
|
|
Yale N. Patt
|
EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 39, Citation Count: 58
|
|
|
ABSTRACT
Current work in Simultaneous Multithreading provides little benefit to programs that aren't partitioned into threads. We propose Simultaneous Subordinate Microthreading (SSMT) to correct this by spawning subordinate threads that perform optimizations on behalf of the single primary thread. These threads, written in microcode, are issued and executed concurrently with the primary thread. They directly manipulate the microarchitecture to improve the primary thread's branch prediction accuracy, cache hit rate, and prefetch effectiveness. All contribute to the performance of the primary thread. This paper introduces SSMT and discusses its potential to increase performance. We illustrate its usefulness with an SSMT machine that executes subordinate microthreads to improve the branch prediction of the primary thread. We show simulation results for the SPECint95 benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
|
 |
5
|
Marius Evers , Po-Yung Chang , Yale N. Patt, Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches, Proceedings of the 23rd annual international symposium on Computer architecture, p.3-11, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
6
|
Marius Evers , Sanjay J. Patel , Robert S. Chappell , Yale N. Patt, An analysis of correlation and predictability: what makes two-level branch predictors work, Proceedings of the 25th annual international symposium on Computer architecture, p.52-61, June 27-July 02, 1998, Barcelona, Spain
|
 |
7
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
 |
8
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Western Research Laboratory, June 1993.
|
 |
13
|
|
| |
14
|
Y. N. Patt. Keynote Address, Workshop on Simultaneous Multithreading (HPCA-4), 1998.
|
| |
15
|
B.J. Smith. A pipelined shared resource mimd computer. In Proceedings of the 1978 International Conference on Parallel Processing, ! 978.
|
 |
16
|
Jared Stark , Marius Evers , Yale N. Patt, Variable length path branch prediction, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.170-179, October 02-07, 1998, San Jose, California, United States
|
 |
17
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
18
|
|
| |
19
|
Augustus K. Uht , Vijay Sindagi , Kelley Hall, Disjoint eager execution: an optimal form of speculative execution, Proceedings of the 28th annual international symposium on Microarchitecture, p.313-325, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
20
|
|
 |
21
|
|
CITED BY 58
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, ACM SIGARCH Computer Architecture News, v.29 n.2, p.14-25, May 2001
|
|
|
|
|
|
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper Threads via Virtual Multithreading, IEEE Micro, v.24 n.6, p.74-82, November 2004
|
|
|
|
|
|
|
|
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, ACM SIGMETRICS Performance Evaluation Review, v.31 n.1, June 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform, ACM SIGPLAN Notices, v.39 n.11, November 2004
|
|
|
|
|
|
|
|
|
Marco Galluzzi , Valentín Puente , Adrián Cristal , Ramón Beivide , José-Ángel Gregorio , Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dongkeun Kim , Steve Shih-wei Liao , Perry H. Wang , Juan del Cuvillo , Xinmin Tian , Xiang Zou , Hong Wang , Donald Yeung , Milind Girkar , John P. Shen, Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.27, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
Tanausú Ramírez , Alex Pajuelo , Oliverio J. Santana , Mateo Valero, Kilo-instruction processors, runahead and prefetching, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
|
|
|
|
|
|
|
|
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Ronald D. Barnes , Erik M. Nystrom , John W. Sias , Sanjay J. Patel , Nacho Navarro , Wen-mei W. Hwu, Beating in-order stalls with "flea-flicker" two-pass pipelining, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.387, December 03-05, 2003
|
|
|
Akihiro Yamamoto , Yusuke Tanaka , Hideki Ando , Toshio Shimada, Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.33-40, September 16-16, 2007, Brasov, Romania
|
|
|
Ronald D. Barnes , John W. Sias , Erik M. Nystrom , Sanjay J. Patel , Jose (Nacho) Navarro , Wen-mei W. Hwu, Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining, IEEE Transactions on Computers, v.55 n.1, p.18-33, January 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|