ACM Home Page
Please provide us with feedback. Feedback
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Full text PdfPdf (451 KB)
Source
PACT archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques table of contents
Toronto, Ontario, Canada
SESSION: Multithreading improvements table of contents
Pages 240-249  
Year of Publication: 2008
ISBN:978-1-60558-282-5
Authors
Qiong Cai  Intel Barcelona Research Center, Barcelona, Spain
José González  Intel Barcelona Research Center, Barcelona, Spain
Ryan Rakvic  United States Naval Academy, Annapolis, Maryland, USA
Grigorios Magklis  Intel Barcelona Research Center, Barcelona, Spain
Pedro Chaparro  Intel Barcelona Research Center, Barcelona, Spain
Antonio González  Intel Barcelona Research Center, Barcelona, Spain
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 129,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1454115.1454149
What is a DOI?

ABSTRACT

We present a novel mechanism, called meeting point thread characterization, to dynamically detect critical threads in a parallel region. We define the critical thread the one with the longest completion time in the parallel region. Knowing the criticality of each thread has many potential applications. In this work, we propose two applications: thread delaying for multi-core systems and thread balancing for simultaneous multi-threaded (SMT) cores. Thread delaying saves energy consumptions by running the core containing the critical thread at maximum frequency while scaling down the frequency and voltage of the cores containing non-critical threads. Thread balancing improves overall performance by giving higher priority to the critical thread in the issue queue of an SMT core. Our experiments on a detailed microprocessor simulator with the Recognition, Mining, and Synthesis applications from Intel research laboratory reveal that thread delaying can achieve energy savings up to more than 40% with negligible performance loss. Thread balancing can improve performance from 1% to 20%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
OpenMP Architecture Review Board. Openmp application program interface, 2005.
 
3
S. Y. Borkar. Platform 2015: Intel processor and platform evolution for the next decode. Intel White Paper, 2005
4
 
5
 
6
 
7
 
8
P. Chaparro, J. Gonzalez, G. Magklis, Q. Cai, and A. Gonzalez. Understanding the termal implications of multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8), 2007.
9
 
10
Intel Corporation. Computer intenstive, highly parallel application and uses. Intel Technology Journal, 9(2), 2005.
 
11
Intel Corporation. Intel's tera-scale research prepares for tens, hundreds of cores, 2006.
 
12
 
13
S. Fischer. Technical overview of the 45nm next generation intel core microarchitecture (penryn), 2007.
 
14
T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella. A 90-nm variable frequency clock system for a power-managed itanium architecture processor. IEEE Journal of Solid-State Circuits, 41, 2006.
 
15
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. Valentine. The intel pentium m processor: Microarchitecture and performance. Intel Technology Journal, 7(2), 2003.
 
16
P. Hazucha, T. Karnik, B.A. Bloechel, C. Parsons, D. Finan, and S. Borkar. Area-efficient linear regulator with ultra-fast load regulation. Solid-State Circuits, IEEE Journal of, 40, 2005.
 
17
H. Homayoun, K.F. Li, and S. Rafatirad. Thread scheduling based on low-quality instruction prediction for simultaneous multithreaded processors. IEEE-NEWCAS Conference, 2005.
 
18
Chenming Hu. Low-voltage cmos device scaling. Solid-State Circuits Conference, 1994.
19
 
20
 
21
 
22
23
 
24
 
25
26
27
 
28
29
 
30
T. Olsson, P. Nilsson, T. Meincke, A. Hemam, and M. Torkelson. A digitally controlled low-power clock multiplier for globally asynchronous locally synchronous designs. ISCAS 2000 Geneva.
 
31
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. ACM SIGOPS Operating Systems Review, 30, 1996.
 
32
 
33
G. Semeraro, D. H. Albonesi, G. Magklis, M. L. Scott, S. Dropsho, and S. Dwarkadas. Hiding synchronization delays in a gals processor microarchitecture. Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, 2004.
34
 
35
R. Uhlig, R. Fishtein, O. Gershon, I. Hirsh, and H. Wang. Softsdv: A pre-silicon software development environment for the ia-64 architecture. Intel Technology Journal, 3(4), 1999.
 
36
 
37
W. Zhu, J. del Cuvillo, and G. R. Gao. Performance characteristics of openmp language constructs on a many-core-on-a-chip architecuture. The 2nd International Workshop on OpenMP (IWOMP), 2006.


Collaborative Colleagues:
Qiong Cai: colleagues
José González: colleagues
Ryan Rakvic: colleagues
Grigorios Magklis: colleagues
Pedro Chaparro: colleagues
Antonio González: colleagues