|
ABSTRACT
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.
|
| |
3
|
M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.
|
| |
4
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
 |
5
|
|
| |
6
|
1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.
|
| |
7
|
|
| |
8
|
|
 |
9
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
 |
15
|
|
| |
16
|
A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.
|
| |
17
|
|
| |
18
|
M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.
|
 |
19
|
|
| |
20
|
P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.
|
 |
21
|
|
 |
22
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
23
|
M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.
|
| |
24
|
|
| |
25
|
|
| |
26
|
Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.
|
| |
27
|
|
CITED BY 16
|
|
|
|
|
Sanjeev Kumar , Michael Chu , Christopher J. Hughes , Partha Kundu , Anthony Nguyen, Hybrid transactional memory, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti Sarangi , James Tuck , Josep Torrellas, Thread-Level Speculation on a CMP can be energy efficient, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
María Jesús Garzarán , Milos Prvulovic , José María Llabería , Víctor Viñals , Lawrence Rauchwerger , Josep Torrellas, Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.3, p.247-279, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seth H. Pugsley , Manu Awasthi , Niti Madan , Naveen Muralimanohar , Rajeev Balasubramonian, Scalable and reliable communication for hardware transactional memory, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Carlos Madriles , Pedro López , Josep M. Codina , Enric Gibert , Fernando Latorre , Alejandro Martinez , Raúl Martinez , Antonio Gonzalez, Boosting single-thread performance in multi-core systems through fine-grain multi-threading, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti R. Sarangi , James Tuck , Josep Torrellas, Energy-Efficient Thread-Level Speculation, IEEE Micro, v.26 n.1, p.80-91, January 2006
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti R. Sarangi , James Tuck , Josep Torrellas, Energy-Efficient Thread-Level Speculation, IEEE Micro, v.26 n.1, p.80-91, January 2006
|
|
|
Cheng Wang , Youfeng Wu , Edson Borin , Shiliang Hu , Wei Liu , Dave Sager , Tin-fook Ngai , Jesse Fang, Dynamic parallelization of single-threaded binary programs using speculative slicing, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|