|
ABSTRACT
Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, and network processors.A multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. The contexts of two or more threads of control are often stored in separate on-chip register sets. Unused instruction slots, which arise from latencies during the pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the thread contexts that are loaded in the register sets.Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different threads simultaneously.Explicit multithreaded processors are multithreaded processors that apply processes or operating system threads in their hardware thread slots. These processors optimize the throughput of multiprogramming workloads rather than single-thread performance. We distinguish these processors from implicit multithreaded processors that utilize thread-level speculation by speculatively executing compiler- or machine-generated threads of control that are part of a single sequential program.This survey paper explains and classifies the explicit multithreading techniques in research and in commercial microprocessors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.2-13, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
2
|
Anant Agarwal , John Kubiatowicz , David Kranz , Beng-Hong Lim , Donald Yeung , Godfrey D'Souza , Mike Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, v.13 n.3, p.48-61, May 1993
[doi> 10.1109/40.216748]
|
| |
3
|
|
| |
4
|
F. Allen , G. Almasi , W. Andreoni , D. Beece , B. J. Berne , A. Bright , J. Brunheroto , C. Cascaval , J. Castanos , P. Coteus , P. Crumley , A. Curioni , M. Denneau , W. Donath , M. Eleftheriou , B. Fitch , B. Fleischer , C. J. Georgiou , R. Germain , M. Giampapa , D. Gresh , M. Gupta , R. Haring , H. Ho , P. Hochschild , S. Hummel , T. Jonas , D. Lieber , G. Martyna , K. Maturu , J. Moreira , D. Newns , M. Newton , R. Philhower , T. Picunko , J. Pitera , M. Pitman , R. Rand , A. Royyuru , V. Salapura , A. Sanomiya , R. Shah , Y. Sham , S. Singh , M. Snir , F. Suits , R. Swetz , W. C. Swope , N. Vishnumurthy , T. J. C. Ward , H. Warren , R. Zhou, Blue Gene: a vision for protein science using a petaflop supercomputer, IBM Systems Journal, v.40 n.2, p.310-327, February 2001
|
| |
5
|
|
| |
6
|
|
 |
7
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
8
|
Peter Bach , Michael Braun , Arno Formella , Jorg Friedrich , Thomas Grun , Cedric Lichtenau, Building the 4 Processor SB-PRAM Prototype, Proceedings of the 30th Hawaii International Conference on System Sciences: Advanced Technology Track, p.14, January 03-06, 1997
|
 |
9
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
10
|
Bolychevsky, A., Jesshope, C. R., and Muchnik, V. B. 1996. Dynamic scheduling in RISC architectures. IEE P. Comput. Dig. Tech. 143, 5, 309--317.
|
| |
11
|
|
 |
12
|
|
| |
13
|
Borkenhagen, J. M., Eickemeyer, R. J., Kalla, R. N., and Kunkel, S. R. 2000. A multithreaded PowerPC processor for commercial servers. IBM J. Res. Dev. 44, 6, 885--898.
|
| |
14
|
Brinkschulte, U., Bechina, A., Picioroaga, F., Schneider, E., Ungerer, T., Kreuzinger, J., and Pfeffer, M. 2000. A microkernel middleware architecture for distributed embedded real-time systems. In Proceedings of the 20th IEEE Symposium on Reliable Distributed Systems (New Orleans LA). 218--226.
|
| |
15
|
|
| |
16
|
Brinkschulte, U., Krakowski, C., Marston, R., Kreuzinger, J., and Ungerer, T. 1999b. The Komodo project: thread-based event handling supported by a multithreaded Java microcontroller. In Proceedings of the 25th Euromicro Conference (Milan, Italy). 122--128.
|
| |
17
|
Brinkschulte, U., Kreuzinger, J., Pfeffer, M., and Ungerer, T. 2002. A scheduling technique providing a strict isolation of real-time threads. In Proceedings of the 7th IEEE International Workshop on Object-oriented Real-time Dependable Systems (San Diego, CA). 169-- 172.
|
| |
18
|
David M. Brooks , Pradip Bose , Stanley E. Schuster , Hans Jacobson , Prabhakar N. Kudva , Alper Buyuktosunoglu , John-David Wellman , Victor Zyuban , Manish Gupta , Peter W. Cook, Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, v.20 n.6, p.26-44, November 2000
[doi> 10.1109/40.888701]
|
| |
19
|
|
 |
20
|
Michael Butler , Tse-Yu Yeh , Yale Patt , Mitch Alsup , Hunter Scales , Michael Shebanow, Single instruction stream parallelism is greater than two, Proceedings of the 18th annual international symposium on Computer architecture, p.276-286, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
21
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
22
|
|
| |
23
|
|
| |
24
|
William J. Dally , J. A. Stuart Fiske , John S. Keen , Richard A. Lethin , Michael D. Noakes , Peter R. Nuth , Roy E. Davison , Gregory A. Fyler, The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms, IEEE Micro, v.12 n.2, p.23-39, March 1992
[doi> 10.1109/40.127581]
|
| |
25
|
Dennis, J. B. and Gao, G. R. 1994. Multithreaded architectures: principles, projects, and issues. In Multithreaded Computer Architecture: A Summary of the State of the Art, R. A. Iannucci, G. R. Gao, R. Halstead, and B. J. Smith, Eds. Kluwer Boston, MA, Dordrecht, The Netherlands, London, U.K. 1--74.
|
| |
26
|
Dorojevets, M. 2000. COOL multithreading in HTMT SPELL-1 processors. Int. J. High Speed Electron. Sys. 10, 1, 247--253.
|
| |
27
|
|
| |
28
|
Dubey, P. K., O'Brien, K., O'Brien, K. M., and Barton, C. 1995. Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grain multithreading. Tech. Rep. RC 19928. IBM, Yorktown Heights, NY.
|
| |
29
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
[doi> 10.1109/40.621209]
|
| |
30
|
Emer, J. S. 1999. Simultaneous multithreading: multiplying Alpha's performance. In Proceedings of the Microprocessor Forum (San Jose, CA).
|
| |
31
|
|
| |
32
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
33
|
|
| |
34
|
Franklin, M. 1993. The multiscalar architecture. Tech. Rep. 1196. Department of Computer Science, University of Wisconsin-Madison, Madison, WI.
|
| |
35
|
|
| |
36
|
Gelinas, B., Hays, P., and Katzman, S. 2002. Fine-grained hardware multi-threading: A CPU architecture for high-touch packed processing. Lexra Inc., Waltham, MA. White paper.
|
| |
37
|
Glaskowsky, P. N. 2002. Network processors mature in 2001. Microproc. Report. February 19, 2002 (online journal).
|
| |
38
|
Grünewald, W. and Ungerer, T. 1996. Towards extremely fast context switching in a blockmultithreaded processor. In Proceedings of the 22nd Euromicro Conference (Prague, Czech Republic). 592--599.
|
| |
39
|
|
| |
40
|
|
| |
41
|
Gwennap, L. 1997. DanSoft develops VLIW design. Microproc. Report 11, 2 (Feb. 17), 18--22.
|
 |
42
|
|
 |
43
|
R. H. Halstead, Jr. , T. Fujita, MASA: a multithreaded processor architecture for parallel symbolic computing, Proceedings of the 15th Annual International Symposium on Computer architecture, p.443-451, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
| |
44
|
|
| |
45
|
|
 |
46
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
| |
47
|
Iannucci, R. A., Gao, G. R., Halstead, R., and Smith, B. J., Eds. 1994. Multithreaded Computer Architecture: A Summary of the State of the Art. Kluwer Boston, MA, Dordrecht, The Netherlands, London, U.K.
|
| |
48
|
IBM Corporation. 1999. IBM network processor. Product overview. IBM, Yorktown Heights, NY.
|
| |
49
|
Intel Corporation. 2002. Intel Internet exchange architecture network processors: flexible, wire-speed processing from the customer premises to the network core. White paper. Intel, Santa Clara, CA.
|
| |
50
|
|
| |
51
|
|
| |
52
|
Kavi, K. M., Levine, D. L., and Hurson, A. R. 1997. A non-blocking multithreaded architecture. In Proceedings of the 5th International Conference on Advanced Computing (Madras, India). 171--177.
|
| |
53
|
|
 |
54
|
|
| |
55
|
J. Kreuzinger , A. Schulz , M. Pfeffer , T. Ungerer , U. Brinkschulte , C. Krakowski, Real-time scheduling on multithreaded processors, Proceedings of the Seventh International Conference on Real-Time Systems and Applications (RTCSA'00), p.155, December 12-14, 2000
|
| |
56
|
Kreuzinger, J. and Ungerer, T. 1999. Context-switching techniques for decoupled multithreaded processors. In Proceedings of the 25th Euromicro Conference (Milan, Italy). 1:248--251.
|
 |
57
|
|
 |
58
|
James Laudon , Anoop Gupta , Mark Horowitz, Interleaving: a multithreading technique targeting multiprocessors and workstations, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.308-318, October 05-07, 1994, San Jose, California, United States
|
| |
59
|
Lawson, S. and Vance, A. 2002. Sun hints at UltraSparc V and beyond. Available online at PC World.com.
|
| |
60
|
|
| |
61
|
|
 |
62
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
63
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain
|
 |
64
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
[doi> 10.1145/263326.263382]
|
| |
65
|
|
| |
66
|
Lüth, K., Metzner, A., Piekenkamp, T., and Risu, J. 1997. The events approach to rapid prototyping for embedded control system. In Proceedings of the Workshop Zielarchitekturen eingebetteter Syststeme (Rostock, Germany). 45--54.
|
| |
67
|
Mankovic, T. E., Popescu, V., and Sullivan, H. 1987. CHoPP priciples of operations. In Proceedings of the 2nd International Supercomputer Conference (Mannheim, Germany). 2--10.
|
 |
68
|
|
| |
69
|
Marr, D. T., Binns, F., Hill, D. L., Hinton, G., Koufaty, D. A., Miller, J. A., and Upton, M. 2002. Hyper-threading technology architecture and microarchitecture: a hypertext history. Intel Technology J. 6, 1 (online journal).
|
| |
70
|
Metzner, A. and Niehaus, J. 2000. MSparc: multithreading in real-time architectures. J. Universal Comput. Sci. 6, 10, 1034--1051.
|
| |
71
|
|
| |
72
|
|
| |
73
|
Oehring, H., Sigmund, U., and Ungerer, T. 1999b. Simultaneous multithreading and multimedia. In Proceedings of the Workshop on Multithreaded Execution, Architecture and Compilation (Orlando, FL).
|
| |
74
|
Yale N. Patt , Sanjay J. Patel , Marius Evers , Daniel H. Friendly , Jared Stark, One Billion Transistors, One Uniprocessor, One Chip, Computer, v.30 n.9, p.51-57, September 1997
[doi> 10.1109/2.612249]
|
| |
75
|
Wolfgang J. Paul , Peter Bach , Michael Bosch , Jörg Fischer , Cédric Lichtenau , Jochen Röhrig, Real PRAM Programming, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, p.522-531, August 27-30, 2002
|
| |
76
|
Pontius, N. and Bagherzadeh, N. 1999. Multithreaded extensions enhance multimedia performance. In Proceedings of the Workshop on Multithreaded Execution, Architecture and Compilation (Orlando, FL).
|
| |
77
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
78
|
|
| |
79
|
|
| |
80
|
|
| |
81
|
Sigmund, U., Steinhaus, M., and Ungerer, T. 2000. Transistor count and chip space assessment of multimedia-enhanced simultaneous multithreaded processors. In Proceedings of the 4th Workshop on Multithreaded Execution, Architecture and Compilation (Monterrey, CA).
|
| |
82
|
Sigmund, U. and Ungerer, T. 1996a. Evaluating a multithreaded superscalar microprocessor versus a multiprocessor chip. In Proceedings of the 4th PASA Workshop on Parallel Systems and Algorithms (Jülich, Germany). 147--159.
|
| |
83
|
|
| |
84
|
Šilc, J., Robič, B., and Ungerer, T. 1998. Asynchrony in parallel computing: from dataflow to multithreading. Parall. Distr. Comput. Practices 1, 1, 57--83.
|
| |
85
|
Šilc, J., Robič, B., and Ungerer, T. 1999. Processor Architecture: From Dataflow to Superscalar and Beyond. Springer-Verlag, Heidelberg and Berlin, Germany, and New York, NY.
|
| |
86
|
Smith, B. J. 1981. Architecture and applications of the HEP multiprocessor computer system. SPIE Real-Time Signal Processing IV 298, 241--248.
|
| |
87
|
|
| |
88
|
|
| |
89
|
Sohi, G. S. 1997. Multiscalar: another fourth-generation processor. Computer 30, 9, 72.
|
| |
90
|
|
 |
91
|
|
| |
92
|
Steinhaus, M., Kolla, R., Larriba-Pey, J. L., Ungerer, T., and Valero, M. 2001. Transistor count and chip space estimation of simple-scalar-based microprocessor models. In Proceedings of the Workshop on Complexity-Effective Design (Göteborg, Sweden).
|
| |
93
|
Sterling, T. 1997. Beyond 100 teraflops through superconductors, holographic storage, and the data vortex. In Proceedings of the International Symposium on Supercomputing (Tokyo, Japan).
|
| |
94
|
Tendler, J. M., Dodson, J. S., Fields, Jr., J. S., Le, H., and Sinharoy, B. 2002. POWER4 system microarchitecture. IBM J. Res. Dev. 46, 1, 5--26.
|
| |
95
|
Texas Instruments. 1994. TMS320C80 Technical brief. Texas Instruments, Dallas, TX.
|
| |
96
|
|
| |
97
|
Tremblay, M. 1999. A VLIW convergent multiprocessor system on a chip. In Proceedings of the Microprocessor Forum (San Jose, CA).
|
| |
98
|
|
| |
99
|
|
 |
100
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
101
|
|
| |
102
|
|
| |
103
|
Ungerer, T., Robič, B., and Šilc, J. 2002. Multithreaded processors. Computer J. 45, 3, 320--348.
|
 |
104
|
|
| |
105
|
|
 |
106
|
|
 |
107
|
|
| |
108
|
|
| |
109
|
Wittenburg, J. P., Meyer, G., and Pirsch, P. 1999. Adapting and extending simultaneous multithreading for high performance video signal processing applications. In Proceedings of the Workshop on Multithreaded Execution, Architecture and Compilation (Orlando, FL).
|
| |
110
|
|
CITED BY 18
|
|
|
|
|
|
|
|
|
|
|
Dong Lan , Ji Zhenzhou , Suixiufeng Suixiufeng , Hu Mingzeng , Cui Guangzuo, A SMT-ARM simulator and performance evaluation, Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems, p.208-210, February 15-17, 2006, Madrid, Spain
|
|
|
Lan Dong , Zhenzhou Ji , Guangzuo Cui , Mingzeng Hu, Multithreading extension for Thumb ISA and decoder support, Proceedings of the 5th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications, p.78-81, February 15-17, 2006, Madrid, Spain
|
|
|
Michael Schulte , John Glossner , Sanjay Jinturkar , Mayan Moudgill , Suman Mamidi , Stamatis Vassiliadis, A Low-Power Multithreaded Processor for Software Defined Radio, Journal of VLSI Signal Processing Systems, v.43 n.2-3, p.143-159, June 2006
|
|
|
|
|
|
|
|
|
Jarek Nieplocha , Andrès Márquez , John Feo , Daniel Chavarría-Miranda , George Chin , Chad Scherrer , Nathaniel Beagley, Evaluating the potential of multithreaded platforms for irregular scientific computations, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
Shigeru Kusakabe , Mitsuhiro Aono , Masaaki Izumi , Satoshi Amamiya , Yoshinari Nomura , Hideo Taniguchi , Makoto Amamiya, Scalability of continuation-based fine-grained multithreading in handling multiple I/O requests on FUCE, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
John Glossner , Daniel Iancu , Mayan Moudgill , Gary Nacer , Sanjay Jinturkar , Stuart Stanley , Michael Schulte, The sandbridge SB3011 platform, EURASIP Journal on Embedded Systems, v.2007 n.1, p.16-16, January 2007
|
|
|
Satoshi Amamiya , Masaaki Izumi , Takanori Matsuzaki , Ryuzo Hasegawa , Makoto Amamiya, Fuce: the continuation-based multithreading processor, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
Torsten Kempf , Malte Doerper , R. Leupers , G. Ascheid , H. Meyr , Tim Kogel , Bart Vanthournout, A Modular Simulation Framework for Spatial and Temporal Task Mapping onto Multi-Processor SoC Platforms, Proceedings of the conference on Design, Automation and Test in Europe, p.876-881, March 07-11, 2005
|
|
|
|
|
|
Roger Moussali , Nabil Ghanem , Mazen A. R. Saghir, Supporting multithreading in configurable soft processor cores, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|