|
ABSTRACT
Pre-execution attacks cache misses for which address prediction driven prefetching fails. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what computations should execute as p-threads and when they should be launched such that total execution time is minimized. It is central to the success of pre-execution.We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during course of program execution. Our approach is to formalize the problem quantitatively and then apply standard techniques to solve it analytically. The framework has two novel components. The slice tree is a data structure that compactly represents a set of static p-threads and the relationships among them. Aggregate advantage is a formula that uses raw program statistics and computation structure to assign each candidate static p-thread a numeric score based on estimated latency tolerance and overhead aggregated over its expected dynamic executions.We use the framework to select p-threads that cover L2 misses and study its effectiveness under different conditions via detailed simulation. We measure the effect of constraining p-thread length, locally optimizing p-threads, using different program samples as a statistical basis selection, and varying several machine parameters. Our framework responds to these changes in an intuitive way. We also validate that aggregate advantage correctly models actual pre-execution.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
 |
2
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
3
|
Robert S. Chappell , Francis Tseng , Adi Yoaz , Yale N. Patt, Difficult-path branch prediction using subordinate microthreads, Proceedings of the 29th annual international symposium on Computer architecture, p.307, May 25-29, 2002, Anchorage, Alaska
|
| |
4
|
|
| |
5
|
|
| |
6
|
A. Farcy, O. Temam, R. Espasa, and T. Juan. "Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes." MICRO-31, Dec. 1998.
|
| |
7
|
B. Fields, S. Rubin, and R. Bodik. "Focusing Processor Policies via Critical Path Prediction." ISCA-27, Jul. 2001.
|
 |
8
|
|
 |
9
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
| |
10
|
C.-K. Luk. "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors." ISCA-28, Jul. 2001.
|
 |
11
|
|
| |
12
|
|
 |
13
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
Y. Song and M. Dubois. "Assisted Execution." Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
|
| |
18
|
C.-L. Yang and A. Lebeck. "Push vs. Pull." ICS-14, May 2000.
|
| |
19
|
C. Zilles and G. Sohi. "Execution Based Prediction Using Speculative Slices." ISCA-28, Jul. 2001.
|
CITED BY 8
|
Tanping Wang , Filip Blagojevic , Dimitrios S. Nikolopoulos, Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors, Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, p.1-12, October 22-23, 2004, Houston, Texas
|
|
|
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, ACM SIGMETRICS Performance Evaluation Review, v.31 n.1, June 2003
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|