| The benefits of clustering in shared address space multiprocessors: an applications-driven investigation |
| Full text |
Html
(5 KB),
Html
(5 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
San Diego, California, United States
Article No. 60
Year of Publication: 1995
ISBN:0-89791-816-9
|
|
Authors
|
|
Andrew Erlichson
|
Computer Systems Lab, Standford University, Standford, CA
|
|
Basem A. Nayfeh
|
Computer Systems Lab, Standford University, Standford, CA
|
|
Jaswinder P. Singh
|
Department of Computer Science, Princeton University, Princeton, NJ
|
|
Kunle Olukotun
|
Computer Systems Lab, Standford University, Standford, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 0, Downloads (12 Months): 5, Citation Count: 3
|
|
|
ABSTRACT
Clustering processors together at a level of the memory hierarchy in shared address space multiprocessors appears to be an attractive technique from several standpoints: Resources are shared, packaging technologies are exploited, and processors within a cluster can share data more effectively. We investigate the performance benefits that can be obtained by clustering on a range of important scientific and engineering applications in moderate to large scale cache coherent machines with small degrees of clustering (up to one eighth of the total number of processors in a cluster). We find that except for applications with near neighbor communication topologies this degree of clustering is not very effective in reducing the inherent communication to computation ratios. Clustering is more useful in reducing the the number of remote capacity misses in unstructured applications, and can improve performance substantially when small first-level caches are clustered in these cases. This suggests that clustering at the first level cache might be useful in highly-integrated, relatively fine-grained environments. For less integrated machines such as current distributed shared memory multiprocessors, our results suggest that clustering at the first-level caches is not very useful in improving application performance; however our results also suggest that in an machine with long interprocessor communication latencies, clustering further away from the processor can provide performance benefits.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Tom Asprey , Gregory S. Averill , Eric DeLano , Russ Mason , Bill Weiner , Jeff Yetter, Performance Features of the PA7100 Microprocessor, IEEE Micro, v.13 n.3, p.22-35, May 1993
[doi> 10.1109/40.216746]
|
| |
2
|
|
 |
3
|
Daniel Lenoski , James Laudon , Truman Joe , David Nakahira , Luis Stevens , Anoop Gupta , John Hennessy, The DASH prototype: implementation and performance, Proceedings of the 19th annual international symposium on Computer architecture, p.92-103, May 19-21, 1992, Queensland, Australia
|
 |
4
|
|
 |
5
|
|
 |
6
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
 |
7
|
|
| |
8
|
Jaswinder Pal Singh et al. Load balancing and data locality in parallel hierarchical N-body simulation. Technical Report CSL-TR-92-505, Stanford University, February 1992. To appear in Journal of Parallel and Distributed Computing.
|
| |
9
|
M. D. Smith, "Tracing with Pixie," Technical CSL-TR-91-497, Stanford University, Computer Systems Laboratory, November 1991.
|
 |
10
|
|
| |
11
|
Susan Spach and Ronald Pulleyblank. Parallel Raytraced Image Generation. Hewlett-Packard Journal, vol. 43, no. 3, pages 76--83, June 1992
|
| |
12
|
S. Goldschmidt, "Scalable Directories for Cache-Coherent Shared-Memory Multiprocessors" Ph.D. Thesis, Stanford University, 1993.
|
 |
13
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
|