| A framework for modeling and optimization of prescient instruction prefetch |
| Full text |
Pdf
(445 KB)
|
| Source
|
Joint International Conference on Measurement and Modeling of Computer Systems
archive
Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
table of contents
San Diego, CA, USA
SESSION: Processor evaluation
table of contents
Pages: 13 - 24
Year of Publication: 2003
ISBN:1-58113-664-1
Also published in ...
|
|
Authors
|
|
Tor M. Aamodt
|
Intel Labs, Santa Clara, CA and University of Toronto, Canada
|
|
Pedro Marcuello
|
Universitat Politécnica de Catalunya, Spain
|
|
Paul Chow
|
University of Toronto, Canada
|
|
Antonio González
|
Universitat Politécnica de Catalunya, Spain
|
|
Per Hammarlund
|
Intel Corp., Hillsboro, OR
|
|
Hong Wang
|
Intel Labs, Santa Clara, CA
|
|
John P. Shen
|
Intel Labs, Santa Clara, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 24, Citation Count: 2
|
|
|
ABSTRACT
This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch. A helper thread is initiated when the main thread encounters a spawn point, and prefetches instructions starting at a distant target point. The target identifies a code region tending to incur I-cache misses that the main thread is likely to execute soon, even though intervening control flow may be unpredictable. The optimization of spawn-target pair selections is formulated by modeling program behavior as a Markov chain based on profile statistics. Execution paths are considered stochastic outcomes, and aspects of program behavior are summarized via path expression mappings. Mappings for computing reaching, and posteriori probability; path length mean, and variance; and expected path footprint are presented. These are used with Tarjan's fast path algorithm to efficiently estimate the benefit of spawn-target pair selections. Using this framework we propose a spawn-target pair selection algorithm for prescient instruction prefetch. This algorithm has been implemented, and evaluated for the Itanium Processor Family architecture. A limit study finds 4.8%to 17% speedups on an in-order simultaneous multithreading processor with eight contexts, over nextline and streaming I-prefetch for a set of benchmarks with high I-cache miss rates. The framework in this paper is potentially applicable to other thread speculation techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
4
|
Robert S. Chappell , Francis Tseng , Adi Yoaz , Yale N. Patt, Difficult-path branch prediction using subordinate microthreads, Proceedings of the 29th annual international symposium on Computer architecture, p.307, May 25-29, 2002, Anchorage, Alaska
|
| |
5
|
|
 |
6
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
7
|
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
|
| |
8
|
J. Emer. Simultaneous multithreading: Multiplying alpha's performance. Microprocessor Forum, Oct. 1999.
|
 |
9
|
|
| |
10
|
G. Hinton and J. Shen. Intel's multi-threading technology. Microprocessor Forum, Oct. 2001.
|
| |
11
|
Jerry Huck , Dale Morris , Jonathan Ross , Allan Knies , Hans Mulder , Rumi Zahir, Introducing the IA-64 Architecture, IEEE Micro, v.20 n.5, p.12-23, September 2000
[doi> 10.1109/40.877947]
|
| |
12
|
Intel Corporation. Special Issue on Intel Hyper-Threading Technology in Pentium® 4 Processors. Intel Technology Journal. Q1 2002.
|
 |
13
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
|