| Reducing control overhead in dataflow architectures |
| Full text |
Pdf
(663 KB)
|
| Source
|
PACT
archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
table of contents
Seattle, Washington, USA
SESSION: Instruction fetch and control flow
table of contents
Pages: 182 - 191
Year of Publication: 2006
ISBN:1-59593-264-X
|
|
Authors
|
|
Andrew Petersen
|
University of Washington, Seattle, WA
|
|
Andrew Putnam
|
University of Washington, Seattle, WA
|
|
Martha Mercaldi
|
University of Washington, Seattle, WA
|
|
Andrew Schwerin
|
University of Washington, Seattle, WA
|
|
Susan Eggers
|
University of Washington, Seattle, WA
|
|
Steve Swanson
|
University of Washington, Seattle, WA
|
|
Mark Oskin
|
University of Washington, Seattle, WA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 55, Citation Count: 1
|
|
|
ABSTRACT
In recent years, computer architects have proposed tiled architectures in response to several emerging problems in processor design, such as design complexity, wire delay, and fabrication reliability. One of these architectures, WaveScalar, uses a dynamic, tagged-token dataflow execution model to simplify the design of the processor tiles and their interconnection network and to achieve good parallel performance. However, using a dataflow execution model reawakens old problems, including the instruction overhead required for control flow. Previous work compiling the functional language Id to the Monsoon Dataflow System found this overhead to be 2–3× that of programs written in C and targeted to a MIPS R3000.In this paper, we present and analyze three compiler optimizations that significantly reduce control overhead with minimal additional hardware. We begin by describing how to translate imperative code into dataflow assembly and analyze the resulting control overhead. We report a similar 2–4× instruction overhead, which suggests that the execution model, rather than a specific source language or target architecture, is responsible. Then, we present the compiler optimizations, each of which is designed to eliminate a particular type of control overhead, and analyze the extent to which they were able to do so. Finally, we evaluate the effect using all optimizations together has on program performance. Together, the optimizations reduce control overhead by 80% on average, increasing application performance between 21–37%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
 |
2
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
| |
3
|
|
| |
4
|
|
 |
5
|
Steven Swanson , Andrew Putnam , Martha Mercaldi , Ken Michelson , Andrew Petersen , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Area-Performance Trade-offs in Tiled Dataflow Architectures, Proceedings of the 33rd annual international symposium on Computer Architecture, p.314-326, June 17-21, 2006
|
 |
6
|
T. Shimada , K. Hiraki , K. Nishida , S. Sekiguchi, Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations, Proceedings of the 13th annual international symposium on Computer architecture, p.226-234, June 02-05, 1986, Tokyo, Japan
|
 |
7
|
|
 |
8
|
|
 |
9
|
S. Sakai , y. Yamaguchi , K. Hiraki , Y. Kodama , T. Yuba, An architecture of a dataflow single chip processor, Proceedings of the 16th annual international symposium on Computer architecture, p.46-53, April 1989, Jerusalem, Israel
|
 |
10
|
|
 |
11
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
| |
12
|
|
| |
13
|
"The WaveScalar architecture," In submission to ACM Transactions on Computer Systems (TOCS), 2006.
|
| |
14
|
Arvind, "Dataflow: Passing the token," in Keynote at the International Symposium on Computer Architecture (ISCA), 2005.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
Aaron Smith , Jon Gibson , Bertrand Maher , Nick Nethercote , Bill Yoder , Doug Burger , Kathryn S. McKinle , Jim Burrill, Compiling for EDGE Architectures, Proceedings of the International Symposium on Code Generation and Optimization, p.185-195, March 26-29, 2006
[doi> 10.1109/CGO.2006.10]
|
| |
22
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
[doi> 10.1109/PACT.2004.26]
|
 |
23
|
|
 |
24
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
25
|
|
| |
26
|
R. Nikhil, "The parallel programming language id and its compilation for parallel machines," in the Workshop on Mazzive Paralleism: Hardware, Programming and Applications, Acamedic Press, 1990.
|
| |
27
|
|
 |
28
|
Mihai Budiu , Girish Venkataramani , Tiberiu Chelcea , Seth Copen Goldstein, Spatial computation, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
29
|
|
| |
30
|
D. E. Culler, S. C. Goldstein, K. E. Schauser, and T. von Eicken, "Empirical study of a dataflow language on the CM-5," in the 2nd Workshop on Dataflow Computing, pp. 187--210, 1992.
|
 |
31
|
|
| |
32
|
M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A complement to superscalar," in the IEEE International Symposium on Performance Analysis of Systems and Software, pp. 177--186, 2005.
|
| |
33
|
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
CITED BY
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|