| The Cray BlackWidow: a highly scalable vector multiprocessor |
| Full text |
Pdf
(1.44 MB)
|
Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - Volume 00
table of contents
Reno, Nevada
SESSION: System architecture
table of contents
Article No. 17
Year of Publication: 2007
ISBN:978-1-59593-764-3
|
|
Authors
|
|
Dennis Abts
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Abdulla Bataineh
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Steve Scott
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Greg Faanes
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Jim Schwarzmeier
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Eric Lundberg
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Tim Johnson
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Mike Bye
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
Gerald Schwoerer
|
Cray Inc., Chippewa Falls, Wisconsin
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 96, Citation Count: 4
|
|
|
ABSTRACT
This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Christopher Batten , Ronny Krashinsky , Steve Gerding , Krste Asanovic, Cache Refill/Access Decoupling for Vector Machines, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.331-342, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.9]
|
 |
3
|
|
| |
4
|
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.
|
| |
5
|
Condensed results for HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/hpcc_results.cgi
|
| |
6
|
Cray XI. http://www.cray.com/products/xl/.
|
| |
7
|
Cray XT3. http://www.cray.com/products/xt3/.
|
| |
8
|
Cray XT4. http://www.cray.com/products/xt4/.
|
 |
9
|
Roger Espasa , Federico Ardanaz , Joel Emer , Stephen Felix , Julio Gago , Roger Gramunt , Isaac Hernandez , Toni Juan , Geoff Lowney , Matthew Mattina , André Seznec, Tarantula: a vector extension to the alpha architecture, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
10
|
|
| |
11
|
HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/.
|
| |
12
|
Intel Core2 Duo. http://www.cray.com/products/xdl/.
|
| |
13
|
A. Johnston. Scaling and Technology Issues for Soft Error Rates. In Proceedings of the 4th Annual Research Conference on Reliability, Stanford, CA, October 2000.
|
 |
14
|
|
| |
15
|
|
| |
16
|
NEC SX-8 Vector supercomputer. http://www.nec.co.jp/press/en/0410/2001.html.
|
| |
17
|
Leonid Oliker , Jonathan Carter , Michael Wehner , Andrew Canning , Stephane Ethier , Art Mirin , David Parks , Patrick Worley , Shigemune Kitawaki , Yoshinori Tsuda, Leading Computational Methods on Scalar and Vector HEC Platforms, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.62, November 12-18, 2005
[doi> 10.1109/SC.2005.41]
|
 |
18
|
|
| |
19
|
S. Scott and A. Bataineh. U.S. Patent: Optimized high-bandwidth cache coherence mechanism, http://www.patentstorm.us/patents/7082500.html. 2006.
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
CITED BY 4
|
|
Sanjeev Kumar , Daehyun Kim , Mikhail Smelyanskiy , Yen-Kuang Chen , Jatin Chhugani , Christopher J. Hughes , Changkyu Kim , Victor W. Lee , Anthony D. Nguyen, Atomic Vector Operations on Chip Multiprocessors, ACM SIGARCH Computer Architecture News, v.36 n.3, p.441-452, June 2008
|
|
|
|
|
|
|
|
|
Akihiro Musa , Yoshiei Sato , Takashi Soga , Koki Okabe , Ryusuke Egawa , Hiroyuki Takizawa , Hiroaki Kobayashi, A shared cache for a chip multi vector processor, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.24-29, October 26-26, 2008, Toronto, Canada
|
|