|
ABSTRACT
Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
AMD Corporation. 2005. Leading the industry: Multi-core technology & dual-core processors from AMD. http://multicore.amd.com/en/Technology/.
|
 |
3
|
|
| |
4
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
| |
5
|
Chang, P. P., Warter, N. J., Mahlke, S. A., Chen, W. Y., and Hwu, W. W. 1991. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rept. CRHC-91-29, Center for Reliable and High-Performance Computing, University of Illinois.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Colohan, C. B., Ailamaki, A., Steffan, J. G., and Mowry, T. C. 2006. Hardware support for large speculative threads. In 33rd Annual International Symposium on Computer Architecture (ISCA '06).
|
| |
10
|
Cytron, R. 1986. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing.
|
| |
11
|
Pradeep K. Dubey , Kevin O'Brien , Kathryn M. O'Brien , Charles Barton, Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.109-121, June 27-29, 1995, Limassol, Cyprus
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
Gabbay, F. and Mendelson, A. 1996. Speculative execution based on value prediction. Tech. Rept. EE Department TR #1080, Technion--Israel Institute of Technology.
|
 |
16
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
| |
17
|
|
| |
18
|
|
 |
19
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
20
|
|
| |
21
|
IBM Corporation. 2007. IBM unleashes world's fastest chip in powerful new computer. http://www-03.ibm.com/press/us/en/pressrelease/21580.wss.
|
| |
22
|
Intel Corporation. 2005. Intel's Dual-Core Processor for Desktop PCs. http://www.intel.com/personal/desktopcomputer/dual_core/index.htm.
|
 |
23
|
|
 |
24
|
Jens Knoop , Oliver Rüthing , Bernhard Steffen, Lazy code motion, Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, p.224-234, June 15-19, 1992, San Francisco, California, United States
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
 |
34
|
|
 |
35
|
|
| |
36
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
37
|
|
 |
38
|
|
 |
39
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
40
|
|
 |
41
|
|
| |
42
|
Sun Corporation. 2005. Throughput computing—Niagara. http://www.sun.com/processors/throughput/.
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
 |
46
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
[doi> 10.1145/193209.193217]
|
| |
47
|
|
| |
48
|
|
| |
49
|
Zhai, A., Colohan, C. B., Steffan, J. G., and Mowry, T. C. 2002. Compiler optimizations to accelerate scalar value communication between speculative threads. Tech. Rept. CMU-CS-02-162, School of Computer Science, Carnegie Mellon University. August.
|
| |
50
|
Antonia Zhai , Christopher B. Colohan , J. Gregory Steffan , Todd C. Mowry, Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.39, March 20-24, 2004, Palo Alto, California
|
| |
51
|
|
| |
52
|
|
|