| Dynamic parallelization of single-threaded binary programs using speculative slicing |
| Full text |
Pdf
(557 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 23rd international conference on Supercomputing
table of contents
Yorktown Heights, NY, USA
SESSION: Compilers
table of contents
Pages 158-168
Year of Publication: 2009
ISBN:978-1-60558-498-0
|
|
Authors
|
|
Cheng Wang
|
Intel Corporation, Santa Clara, CA, USA
|
|
Youfeng Wu
|
Intel Corporation, Santa Clara, CA, USA
|
|
Edson Borin
|
Intel Corporation, Santa Clara, CA, USA
|
|
Shiliang Hu
|
Intel Corporation, Santa Clara, CA, USA
|
|
Wei Liu
|
Intel Corporation, Santa Clara, CA, USA
|
|
Dave Sager
|
Intel Corporation, Hillsboro, OR, USA
|
|
Tin-fook Ngai
|
Intel Corporation, Santa Clara, CA, USA
|
|
Jesse Fang
|
Intel Corporation, Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 30, Downloads (12 Months): 120, Citation Count: 0
|
|
|
ABSTRACT
The performance of single-threaded programs and legacy binary code is of critical importance in many everyday applications. However, neither can hardware multi-core processors directly speed up single-threaded programs, nor can software automatic parallelizing compilers effectively parallelize legacy binary code and irregular applications. In this paper, we propose a framework and a set of algorithms to dynamically parallelize single-threaded binary programs. Our parallelization is based on program slicing and explores both instruction-level parallelism (ILP) and thread-level parallelism (TLP). To significantly reduce the critical path of the parallel slices, our slicing algorithms exploit speculation to cut rare dependences, and use well-designed program transformations to expose parallelism. Furthermore, because we transparently parallelize binary code at runtime, we perform slicing only on program hot regions. Our experiments demonstrate that the proposed speculative slicing approach extracts more parallelism than any known slicing based parallelization schemes. For the SPEC2000 benchmarks, we can achieve 3x parallelism with infinite number of threads, and 1.8x parallelism with 4 threads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jennifer M. Anderson , Saman P. Amarasinghe , Monica S. Lam, Data and computation transformations for multiprocessors, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.166-178, July 19-21, 1995, Santa Barbara, California, United States
|
| |
2
|
|
| |
3
|
W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu and S. Weatherford, "Polaris: The Next Generation in Parallelizing Compilers", LCPC 1994.
|
 |
4
|
|
| |
5
|
L. Chen and Y. Wu, "Aggressive Compiler Optimization and Parallelism with Thread--Level Speculation", ICCP 2003.
|
 |
6
|
Zhao-Hui Du , Chu-Cheow Lim , Xiao-Feng Li , Chen Yang , Qingyu Zhao , Tin-Fook Ngai, A cost-driven compilation framework for speculative parallelization of sequential programs, Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, June 09-11, 2004, Washington DC, USA
|
| |
7
|
|
 |
8
|
|
| |
9
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
10
|
Engin Ipek , Meyrem Kirman , Nevin Kirman , Jose F. Martinez, Core fusion: accommodating software diversity in chip multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
11
|
|
 |
12
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
13
|
Wei Liu , James Tuck , Luis Ceze , Wonsun Ahn , Karin Strauss , Jose Renau , Josep Torrellas, POSH: a TLS compiler that exploits program structure, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122997]
|
| |
14
|
A. J. Martin, M. Nystrom and P. Penzes, "ET2: A metric for time and energy efficiency of computation", Tech. Rep. CaltechCSTR:2001.007, Caltech, Computer Science, 2001.
|
| |
15
|
|
 |
16
|
Milos Prvulovic , María Jesús Garzarán , Lawrence Rauchwerger , Josep Torrellas, Removing architectural bottlenecks to the scalability of speculative parallelization, Proceedings of the 28th annual international symposium on Computer architecture, p.204-215, June 30-July 04, 2001, Göteborg, Sweden
|
| |
17
|
V. Ranganth, T. Amtoft, A. Banerjee, M. Dwyer and J. Hatcli, "A new foundation for control dependence and slicing for modern program structures", ESOP 2005.
|
 |
18
|
Jose Renau , James Tuck , Wei Liu , Luis Ceze , Karin Strauss , Josep Torrellas, Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088173]
|
| |
19
|
|
| |
20
|
F. Tip, "A survey of program slicing techniques." Journal of programming languages, 3:121--189, 1995.
|
| |
21
|
M. D. Weiser, "Reconstructing sequential behavior from parallel behavior projections", Information Processing Letters, 17(3):129--135, 1983.
|
| |
22
|
|
| |
23
|
Omitted for blind submission Y. Wu, M. Breternitz, J. Quek, O. Etzion and J. Fang, "The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators", CGO 2004.
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
Carlos Madriles , Carlos García-Quiñones , Jesús Sánchez , Pedro Marcuello , Antonio González , Dean M. Tullsen , Hong Wang , John P. Shen, Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices, IEEE Transactions on Parallel and Distributed Systems, v.19 n.7, p.914-925, July 2008
[doi> 10.1109/TPDS.2007.70797]
|
 |
28
|
|
| |
29
|
Guilherme Ottoni , Ram Rangan , Adam Stoler , David I. August, Automatic Thread Extraction with Decoupled Software Pipelining, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.105-118, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.13]
|
| |
30
|
Neil Vachharajani , Ram Rangan , Easwaran Raman , Matthew J. Bridges , Guilherme Ottoni , David I. August, Speculative Decoupled Software Pipelining, Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, p.49-59, September 15-19, 2007
[doi> 10.1109/PACT.2007.66]
|
| |
31
|
|
 |
32
|
Naveen Neelakantam , Ravi Rajwar , Suresh Srinivas , Uma Srinivasan , Craig Zilles, Hardware atomicity for reliable software speculation, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
33
|
|
 |
34
|
|
|