|
ABSTRACT
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.
|
| |
5
|
CHANG, P. P., WARTER, N. J., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors. Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.
|
| |
6
|
CHEN, D. K., AND YEW, P. C. Statement re-ordering for DOACROSS loops. In International Conference on Parallel Processing (Aug. 1994), pp. 24-28.
|
| |
7
|
|
| |
8
|
CYTRON, R. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing (1986).
|
| |
9
|
EMER, J. Ev8: The post-ultimate alpha.(keynote address). In International Conference on Parallel Architectures and Compilation Techniques (2001).
|
| |
10
|
FISHER, J. A. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 13 (June 1981).
|
| |
11
|
|
 |
12
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
| |
13
|
|
| |
14
|
|
 |
15
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
16
|
HOLLEY, L. H., AND K. ROSEN, B. Qualified data flow problems. IEEE Transactions on Software Engineering 7, 1 (Jan. 1981).
|
| |
17
|
KAHLE, J. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99 (October 1999).
|
 |
18
|
Jens Knoop , Oliver Rüthing , Bernhard Steffen, Lazy code motion, Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, p.224-234, June 15-19, 1992, San Francisco, California, United States
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
24
|
|
| |
25
|
|
| |
26
|
PADUA, D., KUCK, D., AND LAWRIE, D. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computing (September 1980).
|
 |
27
|
|
| |
28
|
STANDARD PERFORMANCE EVALUATION CORPORATION. The SPEC Benchmark Suite. http://www.specbench.org.
|
| |
29
|
STEFFAN, J. G., COLOHAN, C. B., AND MOWRY, T. C. Architectural Support for Thread-Level Data Speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University, November 1997.
|
 |
30
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
31
|
|
| |
32
|
TJIANG, S., WOLF, M., LAM, M., PIEPER, K., AND HENNESSY, J. Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Germany, 1992, pp. 137-151.
|
| |
33
|
TREMBLAY, M. MAJC: Microprocessor Architecture for Java Computing. HotChips '99 (August 1999).
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
ZHAI, A., COLOHAN, C. B., STEFFAN, J. G., AND MOWRY, T. C. Compiler Optimizations to Accelerate Scalar Value Communication Between Speculative Threads. Tech. Rep. CMU-CS-02-162, School of Computer Science, Carnegie Mellon University, August 2002.
|
| |
38
|
|
| |
39
|
ZILLES, C. B., AND SOHI, G. S. Master/Slave Speculative Parallelization with Distilled Programs. Tech. Rep. TR-1438, Computer Sciences Department, University of Wisconsin-Madison, April 2002.
|
CITED BY 24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jose Renau , James Tuck , Wei Liu , Luis Ceze , Karin Strauss , Josep Torrellas, Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
Wei Liu , James Tuck , Luis Ceze , Wonsun Ahn , Karin Strauss , Jose Renau , Josep Torrellas, POSH: a TLS compiler that exploits program structure, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
Antonia Zhai , Christopher B. Colohan , J. Gregory Steffan , Todd C. Mowry, Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.39, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
Antonia Zhai , Shengyue Wang , Pen-Chung Yew , Guojin He, Compiler optimizations for parallelizing general-purpose applications under thread-level speculation, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haitham Akkary , Komal Jothi , Renjith Retnamma , Satyanarayana Nekkalapu , Doug Hall , Shahrokh Shahidzadeh, On the potential of latency tolerant execution in speculative multithreading, Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, November 24-25, 2008, Cairo, Egypt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|