ACM Home Page
Please provide us with feedback. Feedback
Push vs. pull: data movement for linked data structures
Full text PdfPdf (983 KB)
Source International Conference on Supercomputing archive
Proceedings of the 14th international conference on Supercomputing table of contents
Santa Fe, New Mexico, United States
Pages: 176 - 186  
Year of Publication: 2000
ISBN:1-58113-270-0
Authors
Chia-Lin Yang  Department of Computer Science, Duke University, Durham, NC
Alvin R. Lebeck  Department of Computer Science, Duke University, Durham, NC
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 42,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/335231.335248
What is a DOI?

ABSTRACT

As the performance gap between the CPU and main memory continues to grow, techniques to hide memory latency are essential to deliver a high performance computer system. Prefetching can often overlap memory latency with computation for array-based numeric applications. However, prefetching for pointer-intensive applications still remains a challenging problem. Prefetching linked data structures (LDS) is difficult because the address sequence of LDS traversal does not present the same arithmetic regularity as array-based applications and the data dependence of pointer dereferences can serialize the address generation process.In this paper, we propose a cooperative hardware/software mechanism to reduce memory access latencies for linked data structures. Instead of relying on the past address history to predict future accesses, we identify the load instructions that traverse the LDS, and execute them ahead of the actual computation. To overcome the serial nature of the LDS address generation, we attach a prefetch controller to each level of the memory hierarchy and push, rather than pull, data to the CPU. Our simulations, using four pointer-intensive applications, show that the push model can achieve between 4% and 30% larger reductions in execution time compared to the pull model.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
D. C. Burger, T. M.. Austin, and S. Bennett. Evaluating future microprocessors-the simplescalar tool set. Technical Report 1308, Computer Sciences Department, University of Wisconsin-Madison, July 1996.
 
4
T. Chilimbi, J. Larus, and M. Hill. Improving pointerbased codes through cache-concious data placement. Technical Report CSL-TR-98-1365, University of Wisconsin, Madison., March 1998.
5
6
7
8
 
9
C. Kolb. The rayshade user's guide. In http://graphics.stanford.EDU/ cek/rayshade.
 
10
11
12
13
14
15
 
16
 
17
18
19
20
21
 
22
23

CITED BY  17
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Chia-Lin Yang: colleagues
Alvin R. Lebeck: colleagues

Peer to Peer - Readers of this Article have also read: