ACM Home Page
Please provide us with feedback. Feedback
A framework for modeling and optimization of prescient instruction prefetch
Full text PdfPdf (445 KB)
Source ACM SIGMETRICS Performance Evaluation Review archive
Volume 31 ,  Issue 1  (June 2003) table of contents
SESSION: Processor evaluation table of contents
Pages: 13 - 24  
Year of Publication: 2003
ISSN:0163-5999
Also published in ...
Authors
Tor M. Aamodt  Intel Labs, Santa Clara, CA and University of Toronto, Canada
Pedro Marcuello  Universitat Politécnica de Catalunya, Spain
Paul Chow  University of Toronto, Canada
Antonio González  Universitat Politécnica de Catalunya, Spain
Per Hammarlund  Intel Corp., Hillsboro, OR
Hong Wang  Intel Labs, Santa Clara, CA
John P. Shen  Intel Labs, Santa Clara, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 30,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/885651.781030
What is a DOI?

ABSTRACT

This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch. A helper thread is initiated when the main thread encounters a spawn point, and prefetches instructions starting at a distant target point. The target identifies a code region tending to incur I-cache misses that the main thread is likely to execute soon, even though intervening control flow may be unpredictable. The optimization of spawn-target pair selections is formulated by modeling program behavior as a Markov chain based on profile statistics. Execution paths are considered stochastic outcomes, and aspects of program behavior are summarized via path expression mappings. Mappings for computing reaching, and posteriori probability; path length mean, and variance; and expected path footprint are presented. These are used with Tarjan's fast path algorithm to efficiently estimate the benefit of spawn-target pair selections. Using this framework we propose a spawn-target pair selection algorithm for prescient instruction prefetch. This algorithm has been implemented, and evaluated for the Itanium Processor Family architecture. A limit study finds 4.8%to 17% speedups on an in-order simultaneous multithreading processor with eight contexts, over nextline and streaming I-prefetch for a set of benchmarks with high I-cache miss rates. The framework in this paper is potentially applicable to other thread speculation techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
6
 
7
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
 
8
J. Emer. Simultaneous multithreading: Multiplying alpha's performance. Microprocessor Forum, Oct. 1999.
9
 
10
G. Hinton and J. Shen. Intel's multi-threading technology. Microprocessor Forum, Oct. 2001.
 
11
 
12
Intel Corporation. Special Issue on Intel Hyper-Threading Technology in Pentium® 4 Processors. Intel Technology Journal. Q1 2002.
13
14
 
15
16
17
 
18
 
19
 
20
21
22
23
24
25
 
26
27
28


Collaborative Colleagues:
Tor M. Aamodt: colleagues
Pedro Marcuello: colleagues
Paul Chow: colleagues
Antonio González: colleagues
Per Hammarlund: colleagues
Hong Wang: colleagues
John P. Shen: colleagues

Peer to Peer - Readers of this Article have also read: