| Dynamic performance tuning for speculative threads |
| Full text |
Pdf
(461 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Speculative threading and parallelization
table of contents
Pages 462-473
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Yangchun Luo
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
Venkatesan Packirisamy
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
Wei-Chung Hsu
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
Antonia Zhai
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
Nikhil Mungre
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
Ankit Tarkas
|
University of Minnesota - Twin Cities, Minneapolis, MN, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 104, Downloads (12 Months): 287, Citation Count: 0
|
|
|
ABSTRACT
In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime. This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
BURCEA, M. stOMP: A Specializing Thread Library for OpenMP. PhD thesis, University of Toronto, 2005.
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Pradeep K. Dubey , Kevin O'Brien , Kathryn M. O'Brien , Charles Barton, Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.109-121, June 27-29, 1995, Limassol, Cyprus
|
 |
9
|
Stijn Eyerman , Lieven Eeckhout , Tejas Karkhanis , James E. Smith, A performance counter architecture for computing accurate CPI components, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
10
|
|
| |
11
|
|
 |
12
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
13
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
14
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, p.102, June 19-23, 2004, München, Germany
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
Wei Liu , James Tuck , Luis Ceze , Wonsun Ahn , Karin Strauss , Jose Renau , Josep Torrellas, POSH: a TLS compiler that exploits program structure, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122997]
|
| |
24
|
Jiwei Lu , Howard Chen , Rao Fu , Wei-Chung Hsu , Bobbie Othmer , Pen-Chung Yew , Dong-Yuan Chen, The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, December 03-05, 2003
|
| |
25
|
LU, J., CHEN, H., YEW, P. C., AND HSU, W. C. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction Level Parallelism 6 (2004).
|
| |
26
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.18]
|
 |
27
|
|
 |
28
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
MERICAS, A. Performance monitoring on the POWER5 microprocessor. In Performance Evaluation and Benchmarking, L. K. John and L. Eeckhout, Eds. CRC Press, 2006, pp. 247--266.
|
| |
33
|
|
| |
34
|
OPEN64 DEVELOPERS. Open64 compiler and tools, 2001.
|
| |
35
|
|
| |
36
|
PERELMAN, E., POLITO, M., YVES BOUGUET, J., SAMPSON, J., CALDER, B., AND DULONG, C. Detecting phases in parallel applications on shared memory architectures. In Proc. of the International Parallel and Distributed Processing Symposium. 2006.
|
 |
37
|
|
 |
38
|
|
 |
39
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti Sarangi , James Tuck , Josep Torrellas, Thread-Level Speculation on a CMP can be energy efficient, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088178]
|
 |
40
|
|
 |
41
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
 |
42
|
|
 |
43
|
Gregory T. Sullivan , Derek L. Bruening , Iris Baron , Timothy Garnett , Saman Amarasinghe, Dynamic native optimization of interpreters, Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators, p.50-57, June 12-12, 2003, San Diego, California
[doi> 10.1145/858570.858576]
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
WANG, S., DAI, X., YELLAJYOSULA, K. S., ZHAI, A., AND YEW, P.-C. Loop selection for thread-level speculation. In Proc. of the Workshops on Languages and Compilers for Parallel Computing. Oct 2005.
|
 |
48
|
|
| |
49
|
Antonia Zhai , Christopher B. Colohan , J. Gregory Steffan , Todd C. Mowry, Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.39, March 20-24, 2004, Palo Alto, California
|
| |
50
|
|
|