ACM Home Page
Please provide us with feedback. Feedback
The Potential of Computation Regrouping for Improving Locality
Full text PdfPdf (167 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2004 ACM/IEEE conference on Supercomputing table of contents
Page: 13  
Year of Publication: 2004
ISBN:0-7695-2153-3
Authors
Chen Ding  University of Rochester
Maksim Orlovich  University of Rochester
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 21,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/SC.2004.58

ABSTRACT

Improving program locality has become increasingly important on modern computer systems. An effective strategy is to group computations on the same data so that once the data are loaded into cache, the program performs all their operations before the data are evicted. However, computation regrouping is difficult to automate for programs with complex data and control structures. This paper studies the potential of locality improvement through trace-driven computation regrouping. First, it shows that maximizing the locality is different from maximizing the parallelism or maximizing the cache utilization. The problem is NP-hard even without considering data dependences and cache organization. Then the paper describes a tool that performs constrained computation regrouping on program traces. The new tool is unique because it measures the exact control dependences and applies complete memory renaming and re-allocation. Using the tool, the paper measures the potential locality improvement in a set of commonly used benchmark programs written in C.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
[4] L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78-101, 1966.
5
 
6
 
7
 
8
9
10
11
 
12
13
14
15
16
 
17
 
18
 
19
[19] H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.
 
20
21
22
 
23
[23] K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993. (also available as CRPC-TR94370).
24
25
26
 
27
[27] R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78-117, 1970.
28
29
 
30
 
31
[31] A. Nicolau and J. A. Fisher. Measuring the parallelism available for very long instruction word architecture. IEEE Transactions on Computers, 33(11), 1984.
 
32
 
33
 
34
[34] X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.
35
36
37
 
38
[38] K. B. Theobald, G. R. Gao, and L. J. Hendren. On the limits of program parallelism and its smoothability. Technical Report ACAPS Memo 40, School of Computer Science, McGill University, June 1992.
39
 
40
41
42
 
43
[43] Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.
 
44

Collaborative Colleagues:
Chen Ding: colleagues
Maksim Orlovich: colleagues