ACM Home Page
Please provide us with feedback. Feedback
Impact of modern memory subsystems on cache optimizations for stencil computations
Full text PdfPdf (618 KB)
Source Memory System Performance archive
Proceedings of the 2005 workshop on Memory system performance table of contents
Chicago, Illinois
SESSION: Hardware table of contents
Pages: 36 - 43  
Year of Publication: 2005
ISBN:1-59593-147-3
Authors
Shoaib Kamil  Lawrence Berkeley National Laboratory, Berkeley, CA
Parry Husbands  Lawrence Berkeley National Laboratory, Berkeley, CA
Leonid Oliker  Lawrence Berkeley National Laboratory, Berkeley, CA
John Shalf  Lawrence Berkeley National Laboratory, Berkeley, CA
Katherine Yelick  Lawrence Berkeley National Laboratory, Berkeley, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 38,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1111583.1111589
What is a DOI?

ABSTRACT

In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil computations. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. First we develop a simple benchmark to evaluate the effectiveness of prefetching in cache-based memory systems. Next we present a small parameterized probe and validate its use as a proxy for general stencil computations on three modern microprocessors. We then derive an analytical memory cost model for quantifying cache-blocking behavior and demonstrate its effectiveness in predicting the stencil-computation performance. Overall results demonstrate that recent trends memory system organization have reduced the efficacy of traditional cache-blocking optimizations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
D. Bailey, "Littleś law and high performance computing," RNR Technical Report, 1997.
 
5
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers," IEEE TCAA Newsletter, December 1995.
 
6
"Chombo homepage." http://seesar.lbl.gov/anag/chombo/, 2004.
 
7
"Cactus Homepage." http://www.cactuscode.org, 2004.
 
8
W. Benger, I. Foster, J. Novotny, E. Seidel, J. Shalf, W. Smith, and P. Walker, "Numerical relativity in a distributed environment," in Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
 
9
M. Alcubierre, G. Allen, B. Brgmann, E. Seidel, and W.-M. Suen, "Towards an understanding of the stability properties of the 3+1 evolution equations in general relativity," Phys. Rev. D, vol. (gr-qc/9908079), 2000.
 
10
J. A. Font, M. Miller, W. M. Suen, and M. Tobias, "Three dimensional numerical general relativistic hydrodynamics: Formulations, methods, and code tests," Phys. Rev. D, vol. Phys. Rev. D61, 2000.
 
11
"Performance API homepage." http://icl.cs.utk.edu/papi, 2005.
 
12
"CHUD homepage." http://developer.apple.com/tools/performance/, 2005.
13
 
14
M. M. Strout, L. Carter, J. Ferrante, J. Freeman, and B. Kreaseck, "Combining performance aspects of irregular gauss-seidel via sparse tiling," in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), (College Park, Maryland), July 25-27, 2002.

CITED BY  8

Collaborative Colleagues:
Shoaib Kamil: colleagues
Parry Husbands: colleagues
Leonid Oliker: colleagues
John Shalf: colleagues
Katherine Yelick: colleagues