ACM Home Page
Please provide us with feedback. Feedback
The Cray BlackWidow: a highly scalable vector multiprocessor
Full text PdfPdf (1.44 MB)
Source
Conference on High Performance Networking and Computing archive
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Reno, Nevada
SESSION: System architecture table of contents
Article No. 17  
Year of Publication: 2007
ISBN:978-1-59593-764-3
Authors
Dennis Abts  Cray Inc., Chippewa Falls, Wisconsin
Abdulla Bataineh  Cray Inc., Chippewa Falls, Wisconsin
Steve Scott  Cray Inc., Chippewa Falls, Wisconsin
Greg Faanes  Cray Inc., Chippewa Falls, Wisconsin
Jim Schwarzmeier  Cray Inc., Chippewa Falls, Wisconsin
Eric Lundberg  Cray Inc., Chippewa Falls, Wisconsin
Tim Johnson  Cray Inc., Chippewa Falls, Wisconsin
Mike Bye  Cray Inc., Chippewa Falls, Wisconsin
Gerald Schwoerer  Cray Inc., Chippewa Falls, Wisconsin
Sponsors
IEEE-CS\DATC : IEEE Computer Society
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 105,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1362622.1362646
What is a DOI?

ABSTRACT

This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.
 
5
Condensed results for HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/hpcc_results.cgi
 
6
Cray XI. http://www.cray.com/products/xl/.
 
7
Cray XT3. http://www.cray.com/products/xt3/.
 
8
Cray XT4. http://www.cray.com/products/xt4/.
9
 
10
 
11
HPCC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/.
 
12
Intel Core2 Duo. http://www.cray.com/products/xdl/.
 
13
A. Johnston. Scaling and Technology Issues for Soft Error Rates. In Proceedings of the 4th Annual Research Conference on Reliability, Stanford, CA, October 2000.
14
 
15
 
16
NEC SX-8 Vector supercomputer. http://www.nec.co.jp/press/en/0410/2001.html.
 
17
18
 
19
S. Scott and A. Bataineh. U.S. Patent: Optimized high-bandwidth cache coherence mechanism, http://www.patentstorm.us/patents/7082500.html. 2006.
20
21
 
22


Collaborative Colleagues:
Dennis Abts: colleagues
Abdulla Bataineh: colleagues
Steve Scott: colleagues
Greg Faanes: colleagues
Jim Schwarzmeier: colleagues
Eric Lundberg: colleagues
Tim Johnson: colleagues
Mike Bye: colleagues
Gerald Schwoerer: colleagues