| Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor |
| Full text |
Pdf
(1.48 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 23rd annual international symposium on Computer architecture
table of contents
Philadelphia, Pennsylvania, United States
Pages: 191 - 202
Year of Publication: 1996
ISBN:0-89791-786-3
Also published in ...
|
|
Authors
|
|
Dean M. Tullsen
|
Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
|
|
Susan J. Eggers
|
Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
|
|
Joel S. Emer
|
Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA
|
|
Henry M. Levy
|
Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
|
|
Jack L. Lo
|
Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA
|
|
Rebecca L. Stamm
|
Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 22, Downloads (12 Months): 124, Citation Count: 150
|
|
|
ABSTRACT
Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
 |
3
|
|
 |
4
|
|
 |
5
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
6
|
G.E. Daddis, Jr. and H.C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In International Conference on Parallel Processing, pages I:76- 83, August 1991.
|
| |
7
|
|
| |
8
|
J. Edmondson and R Rubinfield. An overview of the 21164 AXP microprocessor. In Hot Chips VI, pages 1-8, August 1994.
|
| |
9
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
10
|
|
| |
11
|
|
| |
12
|
B.K. Gunther. Superscalarperformance in a multithreaded microprocessor. PhD thesis, University of Tasmania, December 1993.
|
 |
13
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
 |
14
|
|
 |
15
|
James Laudon , Anoop Gupta , Mark Horowitz, Interleaving: a multithreading technique targeting multiprocessors and workstations, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.308-318, October 05-07, 1994, San Jose, California, United States
|
| |
16
|
|
| |
17
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
18
|
S. McFarling. Combining branch predictors. TechnicalReport TN-36, DEC-WRL, June 1993.
|
| |
19
|
R.G. Prasadh and C.-L. Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In International Conference on Parallel Processing, pages I:84-91, August 1991.
|
| |
20
|
Microprocessor Report, October 24 1994.
|
| |
21
|
Microprocessor Report, November 14 1994.
|
| |
22
|
E.G. Sirer. Measuring limits of fine-grained parallelism. Senior Independent Work, Princeton University, June 1993.
|
| |
23
|
B.J. Smith. Architecture and applications ofthe HEP multiprocessor computer system. In SPIE Real Time Signal Processing /V, pages 241-248, 1981.
|
 |
24
|
M. D. Smith , M. Johnson , M. A. Horowitz, Limits on multiple instruction issue, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.290-302, April 03-06, 1989, Boston, Massachusetts, United States
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
W. Yamamoto, M.J. Serrano, A.R. Talcott, R.C. Wood, and M. Nemirosky. Performance estimation of multistreamed, superscalar processors. In Twenty-Seventh Hawaii International Conference on System Sciences, pages I:195-204, January 1994.
|
 |
30
|
|
CITED BY 150
|
|
|
|
|
Nicholas Mitchell , Larry Carter , Jeanne Ferrante , Dean Tullsen, ILP versus TLP on SMT, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.37-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, ACM SIGARCH Computer Architecture News, v.26 n.3, p.39-50, June 1998
|
|
|
|
|
|
|
|
|
|
|
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
|
|
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stefanos Kaxiras , Girija Narlikar , Alan D. Berenbaum , Zhigang Hu, Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack L. Lo , Susan J. Eggers , Henry M. Levy , Sujay S. Parekh , Dean M. Tullsen, Tuning compiler optimizations for simultaneous multithreading, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.114-124, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
Jared Stark , Paul Racunas , Yale N. Patt, Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.34-43, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, ACM SIGARCH Computer Architecture News, v.29 n.2, p.14-25, May 2001
|
|
|
|
|
|
Roger Espasa , Federico Ardanaz , Joel Emer , Stephen Felix , Julio Gago , Roger Gramunt , Isaac Hernandez , Toni Juan , Geoff Lowney , Matthew Mattina , André Seznec, Tarantula: a vector extension to the alpha architecture, ACM SIGARCH Computer Architecture News, v.30 n.2, May 2002
|
|
|
|
|
|
|
|
|
|
|
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, ACM SIGMETRICS Performance Evaluation Review, v.31 n.1, June 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francisco J. Cazorla , Peter M.W. Knijnenburg , Rizos Sakellariou , Enrique Fernández , Alex Ramirez , Mateo Valero, Predictable performance in SMT processors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
|
|
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernández, QoS for High-Performance SMT Processors in Embedded Systems, IEEE Micro, v.24 n.4, p.24-31, July 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francisco J. Cazorla , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernández , Alex Ramirez , Mateo Valero, Architectural support for real-time task scheduling in SMT processors, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
Jason Cong , Ashok Jagannathan , Glenn Reinman , Yuval Tamir, Understanding the energy efficiency of SMT and CMP with multiclustering, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
Ali El-Haj-Mahmoud , Ahmed S. AL-Zawawi , Aravindh Anantaraman , Eric Rotenberg, Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nirmal R. Saxena , Santiago Fernandez-Gomez , Wei-Je Huang , Subhasish Mitra , Shu-Yi Yu , Edward J. McCluskey, Dependable Computing and Online Testing in Adaptive and Configurable Systems, IEEE Design & Test, v.17 n.1, p.29-41, January 2000
|
|
|
|
|
|
|
|
|
Dong Lan , Ji Zhenzhou , Suixiufeng Suixiufeng , Hu Mingzeng , Cui Guangzuo, A SMT-ARM simulator and performance evaluation, Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems, p.208-210, February 15-17, 2006, Madrid, Spain
|
|
|
|
|
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
|
|
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
|
|
|
Michael Schulte , John Glossner , Sanjay Jinturkar , Mayan Moudgill , Suman Mamidi , Stamatis Vassiliadis, A Low-Power Multithreaded Processor for Software Defined Radio, Journal of VLSI Signal Processing Systems, v.43 n.2-3, p.143-159, June 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David W. Oehmke , Nathan L. Binkert , Trevor Mudge , Steven K. Reinhardt, How to Fake 1000 Registers, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.7-18, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiong Cai , José González , Ryan Rakvic , Grigorios Magklis , Pedro Chaparro , Antonio González, Meeting points: using thread criticality to adapt multicore hardware to parallel regions, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Lan Dong , Xiufeng Sui, A multithreading embedded architecture, Proceedings of the 7th conference on Data networks, communications, computers, p.152-154, November 07-09, 2008, Bucharest, Romania
|
|
|
Hongzhou Chen , Lingdi Ping , Xuezeng Pan , Kuijun Lu , Xiaoning Jiang, A swarm-inspired resource distribution for SMT processors, Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems, November 25-28, 2008, Hyogo, Japan
|
|
|
Lan Dong , Yang Yang, An approach on distributed and shared dynamic cache partition, Proceedings of the 7th conference on Data networks, communications, computers, p.155-157, November 07-09, 2008, Bucharest, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Josefa Díaz , J. Ignacio Hidalgo , Francisco Fernández , Oscar Garnica , Sonia López, Improving SMT performance: an application of genetic algorithms to configure resizable caches, Proceedings of the 11th annual conference companion on Genetic and evolutionary computation conference, July 08-12, 2009, Montreal, Québec, Canada
|
|
|
|
|
|
|
|