ACM Home Page
Please provide us with feedback. Feedback
Application and architectural bottlenecks in large scale distributed shared memory machines
Full text PdfPdf (1.55 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 23rd annual international symposium on Computer architecture table of contents
Philadelphia, Pennsylvania, United States
Pages: 134 - 145  
Year of Publication: 1996
ISBN:0-89791-786-3
Also published in ...
Authors
Chris Holt  Computer Systems Laboratory, Stanford University, Stanford, CA
Jaswinder Pal Singh  Department of Computer Science, Princeton University, Princeton, NJ
John Hennessy  Computer Systems Laboratory, Stanford University, Stanford, CA
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 34,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/232973.232988
What is a DOI?

ABSTRACT

Many of the programming challenges encountered in small to moderate-scale hardware cache-coherent shared memory machines have been extensively studied. While work remains to be done, the basic techniques needed to efficiently program such machines have been well explored. Recently, a number of researchers have presented architectural techniques for scaling a cache coherent shared address space to much larger processor counts. In this paper, we examine the extent to which applications can achieve reasonable performance on such large-scale, cache-coherent, distributed shared address space machines, by determining the problems sizes needed to achieve a reasonable level of efficiency. We also look at how much programming effort and optimization is needed to achieve high efficiency, beyond that needed at small processor counts. For each application, we discuss the main architectural bottlenecks that prevent smaller problem sizes or less optimized programs from achieving good efficiency. Our results show that while there are some applications that either do not scale or must be heavily optimized to do so, for most of the applications we studied it is not necessary to heavily modify the code or restructure algorithms to scale well upto several hundred processors, once the basic techniques for load balancing and data locality are used that are needed for small-scale systems as well. Programs written with some care perform well without substantially compromising the ease of programming advantage of a shared address space, and the problem sizes required to achieve good performance are surprisingly small. It is important to be careful about how data structures and layouts interact with system granularities, but these optimizations are usually needed for moderate-scale machines as well.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

ABC+95
 
Convex93
CONVEX Computer Corporation. "Exemplar Architecture Manual". Richardson, TX, 1993.
 
Golds93
 
HHS+95
HKO+94
 
HS94
Chris Holt and Jaswinder Pal Singh. Hierarchical N-Body Methods on Shared Address Space Multiprocessors. SlAM Conference on Parallel Processing for Scientific Computing, February 1995.
KOH+94
 
KSR92
Kendall Square Research. KSR1 Technical Summary. Waltham, MA, 1992.
 
LLG+92
LLJ+92
 
MG91
RLW94
RSG93
 
Singh93
Jaswinder Pal Singh. Hierarchical N-body Methods and Their Implications for Multiprocessors. Ph.D. Thesis, Stanford University, February 1993.
SFL+94
 
SHG93
 
SJH+93
Jaswinder Pal Singh, Truman Joe, John L. Hennessy, and Anoop Gupta. An Empirical Comparison of the KSR-1 ALLCACHE and Stanford DASH Multiprocessors. Supercomputing '93, November 1993.
 
SWG+95
Jaswinder Pal Singh et al. The SPLASH-2 Suite of Parallel Applications, Technical Report to appear, Stanford University.
WSH94

CITED BY  13

Collaborative Colleagues:
Chris Holt: colleagues
Jaswinder Pal Singh: colleagues
John Hennessy: colleagues