|
ABSTRACT
Master/Slave Speculative Parallelization (MSSP) is an execution paradigm for improving the execution rate of sequential programs by parallelizing them speculatively for execution on a multiprocessor. In MSSP, one processor---the master---executes an approximate version of the program to compute selected values that the full program's execution is expected to compute. The master's results are checked by slave processors that execute the original program. This validation is parallelized by cutting the program's execution into tasks. Each slave uses its predicted inputs (as computed by the master) to validate the input predictions of the next task, inductively validating the entire execution.The performance of MSSP is largely determined by the execution rate of the approximate program. Since approximate code has no correctness requirements (in essence it is a software value predictor), it can be optimized more effectively than traditionally generated code. It is free to sacrifice correctness in the uncommon case to maximize performance in the common case.A simulation-based evaluation of an initial MSSP implementation achieves speedups of up to 1.7 (harmonic mean 1.25) on the SPEC2000 integer benchmarks. Performance is currently limited by the effectiveness with which our current automated infrastructure approximates programs, which can likely be improved significantly.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
V. Bala, E. Duesterwald, and S. Banerjia. Transparent Dynamic Optimization. Technical Report HPL-1999-77, Hewlett Packard Labs, June 1999.
|
| |
4
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, Computer Sciences Department, University of Wisconsin--Madison, 1997.
|
| |
5
|
B. Calder, P. Feller, and A. Eustace. Value Profiling and Optimization. Journal of lnstruction Level Parallelism, Mar. 1999.
|
 |
6
|
|
| |
7
|
J. Collins, et al. Speculative precomputation: Long-range prefetching of delinquent loads. ISCA-28, July 2001.
|
| |
8
|
Pradeep K. Dubey , Kevin O'Brien , Kathryn M. O'Brien , Charles Barton, Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.109-121, June 27-29, 1995, Limassol, Cyprus
|
| |
9
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
10
|
J. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, 1981.
|
 |
11
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
 |
12
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
J. Steffan, et al. Improving Value Communication for Thread-Level Speculation. HPCA-6, Jan. 2000.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
CITED BY 28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guilherme Ottoni , Ram Rangan , Adam Stoler , David I. August, Automatic Thread Extraction with Decoupled Software Pipelining, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.105-118, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronald D. Barnes , Erik M. Nystrom , John W. Sias , Sanjay J. Patel , Nacho Navarro , Wen-mei W. Hwu, Beating in-order stalls with "flea-flicker" two-pass pipelining, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.387, December 03-05, 2003
|
|
|
Ronald D. Barnes , John W. Sias , Erik M. Nystrom , Sanjay J. Patel , Jose (Nacho) Navarro , Wen-mei W. Hwu, Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining, IEEE Transactions on Computers, v.55 n.1, p.18-33, January 2006
|
|
|
|
|
|
Easwaran Raman , Guilherme Ottoni , Arun Raman , Matthew J. Bridges , David I. August, Parallel-stage decoupled software pipelining, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
Easwaran Raman , Neil Va hharajani , Ram Rangan , David I. August, Spice: speculative parallel iteration chunk execution, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Madriles , Pedro López , Josep M. Codina , Enric Gibert , Fernando Latorre , Alejandro Martinez , Raúl Martinez , Antonio Gonzalez, Boosting single-thread performance in multi-core systems through fine-grain multi-threading, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
Cheng Wang , Youfeng Wu , Edson Borin , Shiliang Hu , Wei Liu , Dave Sager , Tin-fook Ngai , Jesse Fang, Dynamic parallelization of single-threaded binary programs using speculative slicing, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|