| Impact of modern memory subsystems on cache optimizations for stencil computations |
| Full text |
Pdf
(618 KB)
|
| Source
|
Memory System Performance
archive
Proceedings of the 2005 workshop on Memory system performance
table of contents
Chicago, Illinois
SESSION: Hardware
table of contents
Pages: 36 - 43
Year of Publication: 2005
ISBN:1-59593-147-3
|
|
Authors
|
|
Shoaib Kamil
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Parry Husbands
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Leonid Oliker
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
John Shalf
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Katherine Yelick
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 38, Citation Count: 8
|
|
|
ABSTRACT
In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil computations. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. First we develop a simple benchmark to evaluate the effectiveness of prefetching in cache-based memory systems. Next we present a small parameterized probe and validate its use as a proxy for general stencil computations on three modern microprocessors. We then derive an analytical memory cost model for quantifying cache-blocking behavior and demonstrate its effectiveness in predicting the stencil-computation performance. Overall results demonstrate that recent trends memory system organization have reduced the efficacy of traditional cache-blocking optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
D. Bailey, "Littleś law and high performance computing," RNR Technical Report, 1997.
|
| |
5
|
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers," IEEE TCAA Newsletter, December 1995.
|
| |
6
|
"Chombo homepage." http://seesar.lbl.gov/anag/chombo/, 2004.
|
| |
7
|
"Cactus Homepage." http://www.cactuscode.org, 2004.
|
| |
8
|
W. Benger, I. Foster, J. Novotny, E. Seidel, J. Shalf, W. Smith, and P. Walker, "Numerical relativity in a distributed environment," in Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
|
| |
9
|
M. Alcubierre, G. Allen, B. Brgmann, E. Seidel, and W.-M. Suen, "Towards an understanding of the stability properties of the 3+1 evolution equations in general relativity," Phys. Rev. D, vol. (gr-qc/9908079), 2000.
|
| |
10
|
J. A. Font, M. Miller, W. M. Suen, and M. Tobias, "Three dimensional numerical general relativistic hydrodynamics: Formulations, methods, and code tests," Phys. Rev. D, vol. Phys. Rev. D61, 2000.
|
| |
11
|
"Performance API homepage." http://icl.cs.utk.edu/papi, 2005.
|
| |
12
|
"CHUD homepage." http://developer.apple.com/tools/performance/, 2005.
|
 |
13
|
|
| |
14
|
M. M. Strout, L. Carter, J. Ferrante, J. Freeman, and B. Kreaseck, "Combining performance aspects of irregular gauss-seidel via sparse tiling," in 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), (College Park, Maryland), July 25-27, 2002.
|
CITED BY 8
|
|
Shoaib Kamil , Kaushik Datta , Samuel Williams , Leonid Oliker , John Shalf , Katherine Yelick, Implicit and explicit optimizations for stencil computations, Proceedings of the 2006 workshop on Memory system performance and correctness, October 22-22, 2006, San Jose, California
|
|
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
|
|
|
|
|
|
|
|
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, Scientific computing Kernels on the cell processor, International Journal of Parallel Programming, v.35 n.3, p.263-298, June 2007
|
|
|
Kaushik Datta , Mark Murphy , Vasily Volkov , Samuel Williams , Jonathan Carter , Leonid Oliker , David Patterson , John Shalf , Katherine Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
Mauricio Araya-Polo , Félix Rubio , Raúl de la Cruz , Mauricio Hanzich , José María Cela , Daniele Paolo Scarpazza, 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors, Scientific Programming, v.17 n.1-2, p.185-198, January 2009
|
|
|
|
|