ACM Home Page
Please provide us with feedback. Feedback
Data locality enhancement by memory reduction
Full text PdfPdf (361 KB)
Source International Conference on Supercomputing archive
Proceedings of the 15th international conference on Supercomputing table of contents
Sorrento, Italy
Pages: 50 - 64  
Year of Publication: 2001
ISBN:1-58113-410-X
Authors
Yonghong Song  Sun Microsystems, Inc., 901 San Antonio Rd., Palo Alto, CA
Rong Xu  Department of Computer Sciences, Purdue University, West Lafayette, IN
Cheng Wang  Department of Computer Sciences, Purdue University, West Lafayette, IN
Zhiyuan Li  Department of Computer Sciences, Purdue University, West Lafayette, IN
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 22,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/377792.377806
What is a DOI?

ABSTRACT

In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will more likely remain in higher-speed memory devices, such as the cache. Specifically, we present an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops. When applied to 20 benchmark programs, our technique reduces the memory requirement, counting both the data and the code, by 51% on average. The transformed programs gain a speedup of 1.40 on average, due to the reduced footprint and, consequently, the improved data locality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
12
 
13
 
14
 
15
 
16
C. Leiserson and J. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5-35, 1991.
17
 
18
19
 
20
A. G. Mohamed, G. C. Fox, G. von Laszewski, M. Parashar, T. Haupt, K. Mills, Y.-H. Lu, N.-T. Lin, and N.-K. Yeh. Applications benchmark set for fortran-d and high performance fortran. Technical Report CRPS-TR92260, Center for Research on Parallel Computation, Rice University, June 1992.
 
21
J. Rice and J. Jing. Problems to test parallel and vector languages. Technical Report CSD-TR-1016, Department of Computer Science, Purdue University, 1990.
22
23
 
24
 
25
S. K. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.
26
 
27
Y. Song, R. Xu, C. Wang, and Z. Li. Performance enhancement by memory reduction. Technical Report CSD-TR-00-016, Department of Computer Science, Purdue University, 2000. Also available at http://www.cs.purdue.edu/homes/songyh/academic.html.
28
 
29
 
30

CITED BY  16

Collaborative Colleagues:
Yonghong Song: colleagues
Rong Xu: colleagues
Cheng Wang: colleagues
Zhiyuan Li: colleagues