ACM Home Page
Please provide us with feedback. Feedback
Optimizing for parallelism and data locality
Full text PdfPdf (1.27 MB)
Source International Conference on Supercomputing archive
Proceedings of the 6th international conference on Supercomputing table of contents
Washington, D. C., United States
Pages: 323 - 334  
Year of Publication: 1992
ISBN:0-89791-485-6
Authors
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 35,   Citation Count: 50
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/143369.143427
What is a DOI?

ABSTRACT

Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work explores the trade-offs between effectively utilizing parallelism and memory hierarchy on shared-memory multiprocessors. We present a simple, but surprisingly accurate, memory model to determine cache line reuse from both multiple accesses to the same memory location and from consecutive memory access. The model is used in memory optimizing and loop parallelization algorithms that effectively exploit data locality and parallelism in concert. We demonstrate the efficacy of this approach with very encouraging experimental results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

ACK87
AK84
AK87
 
AS79
 
Ban90a
 
Ban90b
U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Third Workshop on Languages and Compilers }or Parallel Computing, Irvine, CA, August 1990.
 
BFKK92
 
Cal87
CCK90
CKPK90
 
DBMS79
J. Dongarra, J. Bunch, C. Moler, and G. Stewart. LINPACK U~er's Guide. SIAM Fublications, Philadelphia, PA, 1979.
DCHH88
 
FST91
 
GJG88
IT88
KKP+ 81
 
KMC72
D. Kuck, Y. Muraoka, and S. Chen. On the mlmber of operations simultaneously executable in Fortran-like programs and their resulting speedup. IEEE Transactions on Computers, C-21(12):1293-1310, December 1972.
 
KMM91
K. Kennedy, N. McIntosh, and K. S. McKinley. Static performance estimation in a parallelizin& compiler. Technical Report TR91-174, Dept. of Computer Science, Rice University, December 1991.
 
KMT92
K. Kennedy, K. S. MCKinley, and C. Tseng. hnproving data locality. Technical Report TR92-179, Dept. of Computer Science, Rice University, March 1992.
Lam74
LRW91
 
McK92
 
McM86
F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, 1986.
 
Por89
 
WB87
 
WL90
M.E. Wolf azld M. Lain. Maximizing parallelism via loop transformations. In Proceedings of the Third Workshop on Languages and Compzlers for Parallel Computsng, Irvine, CA, August 1990.
WL91
Wol89a
 
Wol89b

CITED BY  50

Collaborative Colleagues:
Ken Kennedy: colleagues
Kathryn S. McKinley: colleagues