|
ABSTRACT
This paper investigates helper threads that improve performance by prefetching data on behalf of an application's main thread. The focus is data prefetch helper threads that lack branch instructions and which generate prefetches for one dynamic instance of a delinquent load instruction per spawned helper thread. This form of helper thread, some-times called a simple p-thread, has been studied previously by Roth et al. [29, 26] who proposed a framework for optimizing their impact. A key step in that framework is predicting the performance impact of a helper thread. In this paper we propose and evaluate a novel performance prediction technique that achieves comparable results yet requires less detailed information about dynamic program behavior. This technique extends a path expression based statistical modeling framework [2] by incorporating information about branch correlation (which we show is important) and by considering data flow information in a statistical manner. Significantly, the profile information we use is similar to that provided within current optimizing compilers. This paper also provides the first comprehensive assessment of the sources of modeling error relevant to predicting the performance impact of simple p-threads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
T. M. Aamodt. Modeling and Optimization of Speculative Threads. PhD thesis, Department of Electrical and Computer Engineering, University of Toronto, 2006.
|
 |
2
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
 |
3
|
|
| |
4
|
|
 |
5
|
Rastislav Bodík , Rajiv Gupta , Mary Lou Soffa, Interprocedural conditional branch elimination, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.146-158, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
6
|
D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. http://www.simplescalar.com, 1997.
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
12
|
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98--25, Department of EE-Systems, University of Southern California, October 1998.
|
| |
13
|
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Computers, 30(7):478--490, 1981.
|
| |
14
|
Gramma Tech. Codesurfer. http://www.grammatech.com, 2007.
|
| |
15
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
| |
16
|
Intel Corporation. Intel®VTune#8482; Performance Analyzer. http://www.intel.com, 2007.
|
| |
17
|
Dongkeun Kim , Steve Shih-wei Liao , Perry H. Wang , Juan del Cuvillo , Xinmin Tian , Xiang Zou , Hong Wang , Donald Yeung , Milind Girkar , John P. Shen, Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.27, March 20-24, 2004, Palo Alto, California
|
 |
18
|
|
 |
19
|
|
 |
20
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
Standard Performance Evaluation Corporation. SPEC 2000 CPU benchmarks. http://www.spec.org/.
|
 |
34
|
|
 |
35
|
|
 |
36
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
37
|
|
 |
38
|
|
|