|
ABSTRACT
Chip multiprocessing has become an exciting new direction for system designers to deliver increased performance by exploiting CMOS scaling. We discuss key design decisions facing the system architect of a chip multiprocessor and describe how these choices were made in the design of the Cell Broadband Engine.An important decision is whether to base system performance on thread-level parallelism alone, or to complement thread-level parallelism with other forms of parallelism. Depending on workload characteristics, providing parallelism at the processor core level may increase overall system efficiency.Parallelism is also a key to utilize available memory bandwidth more efficiently, by overlapping and interleaving multiple accesses to system memory. By interleaving the access streams of multiple threads, memory level parallelism can be increased to allow better memory interface utilization. In addition, compute-transfer parallelism (CTP) offers a new form of parallelism to initiate memory transfers under software control without stalling the requesting thread.We describe how the Cell Broadband Enginetmuses parallelism at all levels of the system abstraction to deliver a quantum leap in application performance, and how the Cell Synergistic Memory Flow engine exploits compute-transfer level parallelism by providing efficient block transfer capabilities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
F. Allen , G. Almasi , W. Andreoni , D. Beece , B. J. Berne , A. Bright , J. Brunheroto , C. Cascaval , J. Castanos , P. Coteus , P. Crumley , A. Curioni , M. Denneau , W. Donath , M. Eleftheriou , B. Fitch , B. Fleischer , C. J. Georgiou , R. Germain , M. Giampapa , D. Gresh , M. Gupta , R. Haring , H. Ho , P. Hochschild , S. Hummel , T. Jonas , D. Lieber , G. Martyna , K. Maturu , J. Moreira , D. Newns , M. Newton , R. Philhower , T. Picunko , J. Pitera , M. Pitman , R. Rand , A. Royyuru , V. Salapura , A. Sanomiya , R. Shah , Y. Sham , S. Singh , M. Snir , F. Suits , R. Swetz , W. C. Swope , N. Vishnumurthy , T. J. C. Ward , H. Warren , R. Zhou, Blue Gene: a vision for protein science using a petaflop supercomputer, IBM Systems Journal, v.40 n.2, p.310-327, February 2001
|
 |
2
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
3
|
|
 |
4
|
|
| |
5
|
Scott Clark, Kent Haselhorst, Kerry Imming, John Irish, Dave Krolak, and Tolga Ozguner. Cell Broadband Engineinterconnect and memory interface. In Hot Chips 17, Palo Alto, CA, August 2005.
|
| |
6
|
Cliff Click. A tour inside the Azul384-way Javaappliance. Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005.
|
| |
7
|
Robert Dennard. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, SC-9:256--268, 1974.
|
| |
8
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
| |
9
|
Brian Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processorfor a Cell processor. IEEE Journal of Solid-State Circuits, 41(1), January 2006.
|
| |
10
|
Andrew Glew. MLPyes! ILPno! In ASPLOS Wild and Crazy Idea Session '98, October 1998.
|
| |
11
|
Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.
|
| |
12
|
Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In IEEE Micro, March 2006.
|
| |
13
|
Peter Hofstee. Introduction to the Cell Broadband Engine. Technical report, IBM Corp., 2005.
|
| |
14
|
|
| |
15
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
16
|
Tejas Karkhanis and James E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.
|
 |
17
|
Valentina Salapura , Randy Bickford , Matthias Blumrich , Arthur A. Bright , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Michael Gschwind , Manish Gupta , Shawn Hall , Ruud A. Haring , Philip Heidelberger , Dirk Hoenicke , Gerard V. Kopcsay , Martin Ohmacht , Rick A. Rand , Todd Takken , Pavlos Vranas, Power and performance optimization at the system level, Proceedings of the 2nd conference on Computing frontiers, p.125-132, May 04-06, 2005, Ischia, Italy
[doi> 10.1145/1062261.1062262]
|
| |
18
|
Viji Srinivasan , David Brooks , Michael Gschwind , Pradip Bose , Victor Zyuban , Philip N. Strenski , Philip G. Emma, Optimizing pipelines for power and performance, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
19
|
|
CITED BY 19
|
|
|
|
|
|
|
|
|
|
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
Edward K. Walters II , J. Eliot B. Moss , Trek Palmer , Timothy Richards , Charles C. Weems, CASL: A rapid-prototyping language for modern micro-architectures, Computer Languages, Systems and Structures, v.34 n.4, p.195-211, December, 2008
|
|
|
Mikhail Smelyanskiy , Victor W Lee , Daehyun Kim , Anthony D Nguyen , Pradeep Dubey, Scaling performance of interior-point method on large-scale chip multiprocessor system, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
Jatin Chhugani , Anthony D. Nguyen , Victor W. Lee , William Macy , Mostafa Hagog , Yen-Kuang Chen , Akram Baransi , Sanjeev Kumar , Pradeep Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
|
|
|
Kaushik Datta , Mark Murphy , Vasily Volkov , Samuel Williams , Jonathan Carter , Leonid Oliker , David Patterson , John Shalf , Katherine Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
Sanjeev Kumar , Daehyun Kim , Mikhail Smelyanskiy , Yen-Kuang Chen , Jatin Chhugani , Christopher J. Hughes , Changkyu Kim , Victor W. Lee , Anthony D. Nguyen, Atomic Vector Operations on Chip Multiprocessors, ACM SIGARCH Computer Architecture News, v.36 n.3, p.441-452, June 2008
|
|
|
|
|
|
Jaejin Lee , Sangmin Seo , Chihun Kim , Junghyun Kim , Posung Chun , Zehra Sura , Jungwon Kim , SangYong Han, COMIC: a coherent shared memory interface for cell be, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Parallel Computing, v.35 n.3, p.178-194, March, 2009
|
|
|
|
|
|
Ana Lucia Varbanescu , Alexander S. van Amesfoort , Tim Cornwell , Ger van Diepen , Rob van Nieuwpoort , Bruce G. Elmegreen , Henk Sips, Building high-resolution sky images using the Cell/B.E., Scientific Programming, v.17 n.1-2, p.113-134, January 2009
|
|
|
|
|
|
Tao Liu , Haibo Lin , Tong Chen , John Kevin O'Brien , Ling Shao, DBDB: optimizing DMATransfer for the cell be architecture, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|
|
Samuel Williams , Jonathan Carter , Leonid Oliker , John Shalf , Katherine Yelick, Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms, Journal of Parallel and Distributed Computing, v.69 n.9, p.762-777, September, 2009
|
|
|
John H. Kelm , Daniel R. Johnson , Matthew R. Johnson , Neal C. Crago , William Tuohy , Aqeel Mahesri , Steven S. Lumetta , Matthew I. Frank , Sanjay J. Patel, Rigel: an architecture and scalable programming interface for a 1000-core accelerator, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|