| A framework for modeling and optimization of prescient instruction prefetch |
| Full text |
Pdf
(445 KB)
|
| Source
|
ACM SIGMETRICS Performance Evaluation Review
archive
Volume 31 , Issue 1 (June 2003)
table of contents
SESSION: Processor evaluation
table of contents
Pages: 13 - 24
Year of Publication: 2003
ISSN:0163-5999
Also published in ...
|
|
Authors
|
|
Tor M. Aamodt
|
Intel Labs, Santa Clara, CA and University of Toronto, Canada
|
|
Pedro Marcuello
|
Universitat Politécnica de Catalunya, Spain
|
|
Paul Chow
|
University of Toronto, Canada
|
|
Antonio González
|
Universitat Politécnica de Catalunya, Spain
|
|
Per Hammarlund
|
Intel Corp., Hillsboro, OR
|
|
Hong Wang
|
Intel Labs, Santa Clara, CA
|
|
John P. Shen
|
Intel Labs, Santa Clara, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 30, Citation Count: 2
|
|
|
ABSTRACT
This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch. A helper thread is initiated when the main thread encounters a spawn point, and prefetches instructions starting at a distant target point. The target identifies a code region tending to incur I-cache misses that the main thread is likely to execute soon, even though intervening control flow may be unpredictable. The optimization of spawn-target pair selections is formulated by modeling program behavior as a Markov chain based on profile statistics. Execution paths are considered stochastic outcomes, and aspects of program behavior are summarized via path expression mappings. Mappings for computing reaching, and posteriori probability; path length mean, and variance; and expected path footprint are presented. These are used with Tarjan's fast path algorithm to efficiently estimate the benefit of spawn-target pair selections. Using this framework we propose a spawn-target pair selection algorithm for prescient instruction prefetch. This algorithm has been implemented, and evaluated for the Itanium Processor Family architecture. A limit study finds 4.8%to 17% speedups on an in-order simultaneous multithreading processor with eight contexts, over nextline and streaming I-prefetch for a set of benchmarks with high I-cache miss rates. The framework in this paper is potentially applicable to other thread speculation techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
4
|
Robert S. Chappell , Francis Tseng , Adi Yoaz , Yale N. Patt, Difficult-path branch prediction using subordinate microthreads, Proceedings of the 29th annual international symposium on Computer architecture, p.307, May 25-29, 2002, Anchorage, Alaska
|
| |
5
|
|
 |
6
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
7
|
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
|
| |
8
|
J. Emer. Simultaneous multithreading: Multiplying alpha's performance. Microprocessor Forum, Oct. 1999.
|
 |
9
|
|
| |
10
|
G. Hinton and J. Shen. Intel's multi-threading technology. Microprocessor Forum, Oct. 2001.
|
| |
11
|
Jerry Huck , Dale Morris , Jonathan Ross , Allan Knies , Hans Mulder , Rumi Zahir, Introducing the IA-64 Architecture, IEEE Micro, v.20 n.5, p.12-23, September 2000
[doi> 10.1109/40.877947]
|
| |
12
|
Intel Corporation. Special Issue on Intel Hyper-Threading Technology in Pentium® 4 Processors. Intel Technology Journal. Q1 2002.
|
 |
13
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|