ACM Home Page
Please provide us with feedback. Feedback
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Full text PdfPdf (1.56 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 25th annual international symposium on Computer architecture table of contents
Barcelona, Spain
Pages: 306 - 317  
Year of Publication: 1998
ISBN:0-8186-8491-7
Also published in ...
Authors
Stephen W. Keckler  Computer Systems Laboratory, Stanford University, Gates CS Building Stanford, CA
William J. Dally  Computer Systems Laboratory, Stanford University, Gates CS Building Stanford, CA
Daniel Maskit  Computer Systems Laboratory, Stanford University, Gates CS Building Stanford, CA
Nicholas P. Carter  Computer Systems Laboratory, Stanford University, Gates CS Building Stanford, CA
Andrew Chang  Computer Systems Laboratory, Stanford University, Gates CS Building Stanford, CA
Whay S. Lee  Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 49,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/279358.279399
What is a DOI?

ABSTRACT

Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between these extremes by enabling tasks with run lengths as small as 20 cycles. As this fine-grain parallelism is orthogonal to ILP and coarse threads, it complements both methods and provides an opportunity for greater speedup. This paper describes the efficient communication and synchronization mechanisms implemented in the Multi-ALU Processor (MAP) chip, including a thread creation instruction, register communication, and a hardware barrier. These register-based mechanisms provide 10 times faster communication and 60 times faster synchronization than mechanisms that operate via a shared on chip cache. With a three-processor implementation of the MAP, fine-grain speedups of 1.2-2.1 are demonstrated on a suite of applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
 
6
 
7
GUREVlCH, Y. The M-Machine operating system. Master of Engineering Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September 1995.
8
 
9
10
11
 
12
ROBBINS, K. A., AND ROBBINS, S. The Crav X-MP/Model 24. Springer-Verlag, 1987.
 
13
The national technology roadmap for semiconductors. Scmiconductor Industry Association, 1997.
14
 
15
Spec benchmark release v1.1, 1992.
16
17
18

CITED BY  14

Collaborative Colleagues:
Stephen W. Keckler: colleagues
William J. Dally: colleagues
Daniel Maskit: colleagues
Nicholas P. Carter: colleagues
Andrew Chang: colleagues
Whay S. Lee: colleagues