ACM Home Page
Please provide us with feedback. Feedback
A quantitative framework for automated pre-execution thread selection
Full text Publisher SitePublisher Site PdfPdf (2.12 MB)
Source International Symposium on Microarchitecture archive
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture table of contents
Istanbul, Turkey
SESSION: Multithreading II table of contents
Pages: 430 - 441  
Year of Publication: 2002
ISBN ~ ISSN:1072-4451 , 0-7695-1859-1
Authors
Amir Roth  University of Pennsylvania
Gurindar S. Sohi  University of Wisconsin--Madison
Sponsors
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
: IEEE TC-uArch
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 12,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  

ABSTRACT

Pre-execution attacks cache misses for which address prediction driven prefetching fails. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what computations should execute as p-threads and when they should be launched such that total execution time is minimized. It is central to the success of pre-execution.We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during course of program execution. Our approach is to formalize the problem quantitatively and then apply standard techniques to solve it analytically. The framework has two novel components. The slice tree is a data structure that compactly represents a set of static p-threads and the relationships among them. Aggregate advantage is a formula that uses raw program statistics and computation structure to assign each candidate static p-thread a numeric score based on estimated latency tolerance and overhead aggregated over its expected dynamic executions.We use the framework to select p-threads that cover L2 misses and study its effectiveness under different conditions via detailed simulation. We measure the effect of constraining p-thread length, locally optimizing p-threads, using different program samples as a statistical basis selection, and varying several machine parameters. Our framework responds to these changes in an intuitive way. We also validate that aggregate advantage correctly models actual pre-execution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
 
6
A. Farcy, O. Temam, R. Espasa, and T. Juan. "Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes." MICRO-31, Dec. 1998.
 
7
B. Fields, S. Rubin, and R. Bodik. "Focusing Processor Policies via Critical Path Prediction." ISCA-27, Jul. 2001.
8
9
 
10
C.-K. Luk. "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors." ISCA-28, Jul. 2001.
11
 
12
13
14
 
15
16
 
17
Y. Song and M. Dubois. "Assisted Execution." Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
 
18
C.-L. Yang and A. Lebeck. "Push vs. Pull." ICS-14, May 2000.
 
19
C. Zilles and G. Sohi. "Execution Based Prediction Using Speculative Slices." ISCA-28, Jul. 2001.

CITED BY  8
 

Collaborative Colleagues:
Amir Roth: colleagues
Gurindar S. Sohi: colleagues

Peer to Peer - Readers of this Article have also read: