| Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors |
| Full text |
Pdf
(1.44 MB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Power in chip multiprocessors
table of contents
Pages 290-301
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 96, Downloads (12 Months): 230, Citation Count: 0
|
|
|
ABSTRACT
With the shift towards chip multiprocessors (CMPs), exploiting and managing parallelism has become a central problem in computing systems. Many issues of parallelism management boil down to discerning which running threads or processes are critical, or slowest, versus which are non-critical. If one can accurately predict critical threads in a parallel program, then one can respond in a variety of ways. Possibilities include running the critical thread at a faster clock rate, performing load balancing techniques to offload work onto currently non-critical threads, or giving the critical thread more on-chip resources to execute faster. This paper proposes and evaluates simple but effective thread criticality predictors for parallel applications. We show that accurate predictors can be built using counters that are typically already available on-chip. Our predictor, based on memory hierarchy statistics, identifies thread criticality with an average accuracy of 93% across a range of architectures. We also demonstrate two applications of our predictor. First, we show how Intel's Threading Building Blocks (TBB) parallel runtime system can benefit from task stealing techniques that use our criticality predictor to reduce load imbalance. Using criticality prediction to guide TBB's task-stealing decisions improves performance by 13-32% for TBB-based PARSEC benchmarks running on a 32-core CMP. As a second application, criticality prediction guides dynamic energy optimizations in barrier-based applications. By running the predicted critical thread at the full clock rate and frequency-scaling non-critical threads, this approach achieves average energy savings of 15% while negligibly degrading performance for SPLASH-2 and PARSEC benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Intel Threading Building Blocks 2.0, 2008
|
 |
2
|
Umut A. Acar , Guy E. Blelloch , Robert D. Blumofe, The data locality of work stealing, Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, p.1-12, July 09-13, 2000, Bar Harbor, Maine, United States
[doi> 10.1145/341800.341801]
|
 |
3
|
|
 |
4
|
Christian Bienia , Sanjeev Kumar , Jaswinder Pal Singh , Kai Li, The PARSEC benchmark suite: characterization and architectural implications, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454128]
|
| |
5
|
C. Bienia, S. Kumar, and K. Li. PARSEC vs SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors, IEEE Intl. Symp. on Workload Characterization, 2008
|
 |
6
|
|
| |
7
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Journal of Parallel and Distributed Computing, v.37 n.1, p.55-69, Aug. 25, 1996
[doi> 10.1006/jpdc.1996.0107]
|
 |
8
|
Shekhar Borkar , Tanay Karnik , Siva Narendra , Jim Tschanz , Ali Keshavarzi , Vivek De, Parameter variations and impact on circuits and microarchitecture, Proceedings of the 40th annual Design Automation Conference, June 02-06, 2003, Anaheim, CA, USA
[doi> 10.1145/775832.775920]
|
 |
9
|
Qiong Cai , José González , Ryan Rakvic , Grigorios Magklis , Pedro Chaparro , Antonio González, Meeting points: using thread criticality to adapt multicore hardware to parallel regions, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454149]
|
 |
10
|
|
 |
11
|
|
| |
12
|
G. Contreras and M. Martonosi. Characterizing and Improving the Performance of Intel Threading Building Blocks. IEEE Intl. Symp. on Workload Characterization, 2008
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
Aamer Jaleel , William Hasenplaugh , Moinuddin Qureshi , Julien Sebot , Simon Steely, Jr. , Joel Emer, Adaptive insertion policies for managing shared caches, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454145]
|
| |
18
|
|
| |
19
|
W. Kim, M. S. Gupta, G. Y. Wei, and D. Brooks. System Level Analysis of Fast, Per-Core DVFS Using On-Chip Switching Regulators. Intl. Symp. on High Performance Computer Architecture, 2008
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. Intl. Symp. on Performance Analysis of Systems and Software, 2001
|
 |
26
|
Milo M. K. Martin , Daniel J. Sorin , Bradford M. Beckmann , Michael R. Marty , Min Xu , Alaa R. Alameldeen , Kevin E. Moore , Mark D. Hill , David A. Wood, Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
[doi> 10.1145/1105734.1105747]
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
 |
31
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
32
|
|
 |
33
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
|