|
ABSTRACT
Advances in microarchitecture, packaging, and manufacturing processes enable designers to build new systems with higher performance and scalability. Using microbenchmark techniques, we contrast the memory and communication performance of two generations of the HP/Convex Exemplar scalable parallel processing system. The SPP1000 and SPP2000 have significant architectural and implementation differences, but maintain upward binary compatibility. The SPP2000 employs manufacturing and packaging advances to obtain shorter system interconnects with wider data paths and improved functionality, thereby reducing the latency and increasing the bandwidth of remote communication. Although the memory latency is not significantly improved, newer out-of-order execution processors coupled with nonblocking caches achieve much higher memory bandwidth. The SPP2000 has a richer system interconnect topology that allows scalability to a larger number of processors. The SPP2000 also employs innovations in its coherence protocols to improve synchronization and communication performance. This paper characterizes the performance effects of these changes, and identifies some remaining inefficiencies, in the cache coherence protocol and the node configuration, that future systems should address.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Abandah and E. Davidson. A Comparative Study of Cache-Coherent Nonunitbrm Memory Access Systems. In High Performance Computing Systems and Applications. Kluwer Academic Publishers, May 1998. 12th Ann. lnt'l Symp. High Performance Computing Systems and Applications (HPCS'98).
|
| |
2
|
|
| |
3
|
Tom Asprey , Gregory S. Averill , Eric DeLano , Russ Mason , Bill Weiner , Jeff Yetter, Performance Features of the PA7100 Microprocessor, IEEE Micro, v.13 n.3, p.22-35, May 1993
[doi> 10.1109/40.216746]
|
| |
4
|
G. Astfalk and T. Brewer. An Overview of the HP/Convex Exemplar Hardware. Tech. paper, Hewlett-Packard Co., June 1997. http://www.hp.com/wsg/tech/technical.html.
|
| |
5
|
G. Astfalk, T. Brewer, and G. Palmer. Cache Coherence in the Convex MPP. Tech. paper, Hewlett-Packard Co., Feb. 1994. http://www.hp.com/wsg/tech/technical.html.
|
| |
6
|
|
| |
7
|
|
| |
8
|
W. Bryg, K. Chan, and N. Fiduccia. A High- Performance, Low-Cost Multiprocessor Bus for Workstations and Midrange Servers. HewIett-Packard J., 47(1 ):18- 24, Feb. 1996.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
IEEE Computer Society. IEEE Standard for Scalable Coherent Interface (SCI), Aug. 1993. IEEE Std 1596-1992.
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
L. McVoy and C. Staelin. lmbench: Portable Tools for Per.. formance Analysis. In Proc. USENIX'96 Ann. Technical Conf., pages 279-294, Jan. 1996.
|
 |
19
|
|
| |
20
|
K. Shaw and G. Astfalk. Four-State Cache- Coherence in the Convex Exemplar System. Internal memo, Convex Computer Corp., Oct. 1995. http://www'hp'c~m/wsg/tech/technical'html"
|
| |
21
|
SPEC CPU95 Benchmarks Results. See the Standard Performance Evaluation Corp., web page http://www.spec.org/.
|
| |
22
|
|
CITED BY 6
|
|
Paul Messina , David Culler , Wayne Pfeiffer , William Martin , J. Tinsley Oden , Gary Smith, Architecture, Communications of the ACM, v.41 n.11, p.36-44, Nov. 1998
|
|
|
|
|
|
Ravi Iyer , Nancy M. Amato , Lawrence Rauchwerger , Laxmi Bhuyan, Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications, Proceedings of the 13th international conference on Supercomputing, p.339-347, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|