|
ABSTRACT
The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. The system includes a number of novel architectural features designed to tolerate latency, enhance scalability, and deliver high performance on scientific and engineering codes. Included among these are stream buffers, which detect and prefetch down small-stride reference streams, E-registers, which provide latency hiding and non-unit-stride access capabilities, barrier and fetch_and_op synchronization support, and a scalable, high-bandwidth interconnection network.This paper reports our experiences with the CRAY T3E and presents a variety of performance measurements. Section 2 provides a brief overview of the system architecture. Section 3 describes the latency-hiding features (caches, stream buffers and E-registers) in more detail, assesses their performance impact, and discusses coding techniques for using them. Section 4 presents single-processor performance results. Finally, Section 5 discusses system scalability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anderson, E., J. Brooks, and T. Hewitt, "The Benchmarker's Guide to Single-processor Optimization for CRAY T3E Systems," Cray Research, June 1997. Available at the URL http://www.cray.com/products/systems/crayt3e/benchmark.ps
|
| |
2
|
Bailey, D. H., J. T. Barton, T. A. Lasinski, and H. D. Simon, eds: "The NAS Parallel Benchmarks," NASA Technical Memorandum 103863, NASA Ames Research Center, Moffett Field, CA, 94035-1000, July 1993.
|
| |
3
|
Berry, M., C. Grassl, and V. Krishna, "Blocked Data Distribution for the Conjugate Gradient Algorithm on the CRAY T3D," Cray Research, 1994.
|
 |
4
|
|
| |
5
|
|
| |
6
|
Scott, S. and G. Thorson, "The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus," HOT Interconnects IV, Stanford University, August 1996.
|
 |
7
|
|
CITED BY 23
|
|
|
|
|
John B. Drake , Steve Hammond , Rodney James , Patrick H. Worley, Performance tuning and evaluation of a parallel community climate model, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.34-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. Chien , M. Lauria , R. Pennington , M. Showerman , G. Iannello , M. Buchanan , K. Connelly , L. Giannini , G. Koeni , S. Krishnamurthy , Q. Liu , S. Pakin , G. Sampemane, Design and Evaluation of an HPVM-Based Windows NT Supercomputer, International Journal of High Performance Computing Applications, v.13 n.3, p.201-219, August 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|