ACM Home Page
Please provide us with feedback. Feedback
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
Full text PdfPdf (1.11 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 28th annual international symposium on Computer architecture table of contents
Göteborg, Sweden
Pages: 40 - 51  
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
Author
Chi-Keung Luk  VSSAD/Alpha Development Group, Compaq Computer Corporation
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA : TC on Computer Arhitecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 45,   Citation Count: 53
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379240.379250
What is a DOI?

ABSTRACT

Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive approach is to use idle threads on these machines to perform pre-execution—essentially a combined act of speculative address generation and prefetching—to accelerate the main thread. In this paper, we propose such a pre-execution technique for simultaneous multithreading (SMT) processors. By using software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need of shortening programs for pre-execution, and no need of special hardware to copy register values upon thread spawns). Consequently, only minimal extensions to SMT machines are required to support our technique. Despite its simplicity, our technique offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Alpha Development Group, Compaq Computer Corp. The Asim Manual, 2000.
4
5
 
6
7
 
8
9
 
10
Standard Performance Evaluation Corporation. The SPEC95 benchmark suite. hup://www.specbench org.
 
11
M. Dubois and Y. H Song. Assisted execution. Technical Report CENG Technical Report 98-25, University of Southern California, October 1998.
12
 
13
J. S. Emer. Simultaneous Multithreading: Multiplying Alpha Performance. Micoprocessor Forum, October 1999.
 
14
J. S. Emer. Relaxing Constraints: Thoughts on the Evolution of Computer Architecture. Keynote Speech for the 7th HPCA. January 2000.
 
15
 
16
 
17
18
 
19
N. Kohout S. Cboi. and D. Yeung. Mulfi-chain pret;etching: Exploiting memory parallelism in pointer-chasing codes. In ISCA Workshop on Solving the Memory Wall Problem. 2000.
20
 
21
 
22
 
23
 
24
25
26
27
 
28
29
 
30
31
32
33
34
35
36

CITED BY  54