ACM Home Page
Please provide us with feedback. Feedback
Optimization of data prefetch helper threads with path-expression based statistical modeling
Full text PdfPdf (536 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 21st annual international conference on Supercomputing table of contents
Seattle, Washington
SESSION: Architecture -- memory hierarchy table of contents
Pages: 210 - 221  
Year of Publication: 2007
ISBN:978-1-59593-768-1
Authors
Tor M. Aamodt  University of British Columbia, British Columbia, Canada
Paul Chow  University of Toronto, Toronto, Ontario, Canada
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 48,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1274971.1275001
What is a DOI?

ABSTRACT

This paper investigates helper threads that improve performance by prefetching data on behalf of an application's main thread. The focus is data prefetch helper threads that lack branch instructions and which generate prefetches for one dynamic instance of a delinquent load instruction per spawned helper thread. This form of helper thread, some-times called a simple p-thread, has been studied previously by Roth et al. [29, 26] who proposed a framework for optimizing their impact. A key step in that framework is predicting the performance impact of a helper thread. In this paper we propose and evaluate a novel performance prediction technique that achieves comparable results yet requires less detailed information about dynamic program behavior. This technique extends a path expression based statistical modeling framework [2] by incorporating information about branch correlation (which we show is important) and by considering data flow information in a statistical manner. Significantly, the profile information we use is similar to that provided within current optimizing compilers. This paper also provides the first comprehensive assessment of the sources of modeling error relevant to predicting the performance impact of simple p-threads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T. M. Aamodt. Modeling and Optimization of Speculative Threads. PhD thesis, Department of Electrical and Computer Engineering, University of Toronto, 2006.
2
3
 
4
5
 
6
D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. http://www.simplescalar.com, 1997.
 
7
 
8
9
 
10
11
 
12
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98--25, Department of EE-Systems, University of Southern California, October 1998.
 
13
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Computers, 30(7):478--490, 1981.
 
14
Gramma Tech. Codesurfer. http://www.grammatech.com, 2007.
 
15
 
16
Intel Corporation. Intel®VTune#8482; Performance Analyzer. http://www.intel.com, 2007.
 
17
18
19
20
 
21
22
23
 
24
 
25
26
27
 
28
 
29
 
30
31
 
32
 
33
Standard Performance Evaluation Corporation. SPEC 2000 CPU benchmarks. http://www.spec.org/.
34
35
36
37
38

Collaborative Colleagues:
Tor M. Aamodt: colleagues
Paul Chow: colleagues