ACM Home Page
Please provide us with feedback. Feedback
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Full text PdfPdf (330 KB)
Source International Conference on Supercomputing archive
Proceedings of the 19th annual international conference on Supercomputing table of contents
Cambridge, Massachusetts
SESSION: Session 5: compilers II table of contents
Pages: 179 - 188  
Year of Publication: 2005
ISBN:1-59593-167-8
Authors
Jose Renau  University of California, Santa Cruz
James Tuck  University of Illinois at Urbana-Champaign
Wei Liu  University of Illinois at Urbana-Champaign
Luis Ceze  University of Illinois at Urbana-Champaign
Karin Strauss  University of Illinois at Urbana-Champaign
Josep Torrellas  University of Illinois at Urbana-Champaign
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 44,   Citation Count: 12
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1088149.1088173
What is a DOI?

ABSTRACT

Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging.While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-order only spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
6
 
7
 
8
SSA for Trees - GNU Project, May 2003. http://www.gccsummit.org/2003/view_abstract.php?talk=2.
9
 
10
 
11
X. F. Li, Z. H. Dui, Q. Y. Zhao, and T. F. Ngai. Software Value Prediction for Speculative Parallel Threaded Computations. In Value Prediction Workshop, pages 18--25, June 2003.
 
12
R. H. Littin, J. A. D. McWha, M. W. Pearson, and J. G. Cleary. Block Based Execution and Task Level Parallelism. In Australian Computer Science Communications, pages 57--66, 1998.
13
 
14
 
15
 
16
17
18
 
19
M. Tremblay. MAJC; Microprocessor Architecture for Java Computing. Hot Chips, August 1999.
 
20
 
21
 
22
 
23
24

CITED BY  14
Collaborative Colleagues:
Jose Renau: colleagues
James Tuck: colleagues
Wei Liu: colleagues
Luis Ceze: colleagues
Karin Strauss: colleagues
Josep Torrellas: colleagues