|
ABSTRACT
Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of the main computation. This paper investigates a source-to-source C compiler for extracting pre-execution thread code automatically, thus relieving the programmer or hardware from this onerous task. At the heart of our compiler are three algorithms. First, program slicing removes non-critical code for computing cache-missing memory references, reducing pre-execution overhead. Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls. Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate thread-level parallelism when necessary for latency tolerance. We prototyped our algorithms using the Stanford University Intermediate Format (SUIF) framework and a publicly available program slicer, called Unravel [13], and we evaluated our compiler on a detailed architectural simulator of an SMT processor. Our results show compiler-based pre-execution improves the performance of 9 out of 13 applications, reducing execution time by 22.7%. Across all 13 applications, our technique delivers an average speedup of 17.0%. These performance gains are achieved fully automatically on conventional SMT hardware, with only minimal modifications to support pre-execution threads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. Binkley and K. Gallagher. A Survey of Program Slicing. Academic Press, 1996.
|
| |
3
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. CS TR 1342, University of Wisconsin-Madison, June 1997.
|
 |
4
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
5
|
|
| |
6
|
|
 |
7
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
8
|
R. Cytron. Doacross: Beyond Vectorization for Multiprocessors. In International Conference on Parallel Processing, August 1986.
|
| |
9
|
M. Dubois and Y. Song. Assisted Execution. CENG TR 98-25, University of Southern California, October 1998.
|
 |
10
|
|
 |
11
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
12
|
|
| |
13
|
J. Lyle and D. Wallace. Using the unravel program slicing tool to evaluate high integrity software. In 10th International Software Quality Week, May 1997.
|
| |
14
|
J. Lyle, D. Wallace, J. Graham, K. Gallagher, J. Poole, and D. Binkley. Unravel: A CASE Tool to Assist Evaluation of High Integrity Software. NISTIR 5691, National Institute of Standards and Technology, August 1995.
|
| |
15
|
|
 |
16
|
|
| |
17
|
D. Padua, D. Kuck, and D. Lawrie. High-Speed Multiprocessors and Compilation Techniques. IEEE Transactions on Computers, C-29(9):763-776, September 1980.
|
 |
18
|
|
 |
19
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
| |
20
|
|
 |
21
|
|
 |
22
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
23
|
|
 |
24
|
|
CITED BY 17
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform, ACM SIGPLAN Notices, v.39 n.11, November 2004
|
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper Threads via Virtual Multithreading, IEEE Micro, v.24 n.6, p.74-82, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sorin Iacobovici , Lawrence Spracklen , Sudarshan Kadambi , Yuan Chou , Santosh G. Abraham, Effective stream-based and execution-based data prefetching, Proceedings of the 18th annual international conference on Supercomputing, June 26-July 01, 2004, Malo, France
|
|
|
|
|
|
Dongkeun Kim , Steve Shih-wei Liao , Perry H. Wang , Juan del Cuvillo , Xinmin Tian , Xiang Zou , Hong Wang , Donald Yeung , Milind Girkar , John P. Shen, Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.27, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
Yao Guo , Vladimir Vlassov , Raksit Ashok , Richard Weiss , Csaba Andras Moritz, Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization, Journal of Parallel and Distributed Computing, v.68 n.2, p.165-181, February, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|