|
ABSTRACT
We present a novel mechanism, called meeting point thread characterization, to dynamically detect critical threads in a parallel region. We define the critical thread the one with the longest completion time in the parallel region. Knowing the criticality of each thread has many potential applications. In this work, we propose two applications: thread delaying for multi-core systems and thread balancing for simultaneous multi-threaded (SMT) cores. Thread delaying saves energy consumptions by running the core containing the critical thread at maximum frequency while scaling down the frequency and voltage of the cores containing non-critical threads. Thread balancing improves overall performance by giving higher priority to the critical thread in the issue queue of an SMT core. Our experiments on a detailed microprocessor simulator with the Recognition, Mining, and Synthesis applications from Intel research laboratory reveal that thread delaying can achieve energy savings up to more than 40% with negligible performance loss. Thread balancing can improve performance from 1% to 20%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
OpenMP Architecture Review Board. Openmp application program interface, 2005.
|
| |
3
|
S. Y. Borkar. Platform 2015: Intel processor and platform evolution for the next decode. Intel White Paper, 2005
|
 |
4
|
|
| |
5
|
|
| |
6
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.17]
|
| |
7
|
|
| |
8
|
P. Chaparro, J. Gonzalez, G. Magklis, Q. Cai, and A. Gonzalez. Understanding the termal implications of multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8), 2007.
|
 |
9
|
|
| |
10
|
Intel Corporation. Computer intenstive, highly parallel application and uses. Intel Technology Journal, 9(2), 2005.
|
| |
11
|
Intel Corporation. Intel's tera-scale research prepares for tens, hundreds of cores, 2006.
|
| |
12
|
|
| |
13
|
S. Fischer. Technical overview of the 45nm next generation intel core microarchitecture (penryn), 2007.
|
| |
14
|
T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella. A 90-nm variable frequency clock system for a power-managed itanium architecture processor. IEEE Journal of Solid-State Circuits, 41, 2006.
|
| |
15
|
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. Valentine. The intel pentium m processor: Microarchitecture and performance. Intel Technology Journal, 7(2), 2003.
|
| |
16
|
P. Hazucha, T. Karnik, B.A. Bloechel, C. Parsons, D. Finan, and S. Borkar. Area-efficient linear regulator with ultra-fast load regulation. Solid-State Circuits, IEEE Journal of, 40, 2005.
|
| |
17
|
H. Homayoun, K.F. Li, and S. Rafatirad. Thread scheduling based on low-quality instruction prediction for simultaneous multithreaded processors. IEEE-NEWCAS Conference, 2005.
|
| |
18
|
Chenming Hu. Low-voltage cmos device scaling. Solid-State Circuits Conference, 1994.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , Keith I. Farkas, Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, Proceedings of the 31st annual international symposium on Computer architecture, p.64, June 19-23, 2004, München, Germany
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
Grigorios Magklis , Pedro Chaparro , José González , Antonio González, Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
[doi> 10.1145/1165573.1165586]
|
| |
28
|
|
 |
29
|
|
| |
30
|
T. Olsson, P. Nilsson, T. Meincke, A. Hemam, and M. Torkelson. A digitally controlled low-power clock multiplier for globally asynchronous locally synchronous designs. ISCAS 2000 Geneva.
|
| |
31
|
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. ACM SIGOPS Operating Systems Review, 30, 1996.
|
| |
32
|
|
| |
33
|
G. Semeraro, D. H. Albonesi, G. Magklis, M. L. Scott, S. Dropsho, and S. Dwarkadas. Hiding synchronization delays in a gals processor microarchitecture. Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, 2004.
|
 |
34
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, ACM SIGARCH Computer Architecture News, v.24 n.2, p.191-202, May 1996
|
| |
35
|
R. Uhlig, R. Fishtein, O. Gershon, I. Hirsh, and H. Wang. Softsdv: A pre-silicon software development environment for the ia-64 architecture. Intel Technology Journal, 3(4), 1999.
|
| |
36
|
|
| |
37
|
W. Zhu, J. del Cuvillo, and G. R. Gao. Performance characteristics of openmp language constructs on a many-core-on-a-chip architecuture. The 2nd International Workshop on OpenMP (IWOMP), 2006.
|
|