|
ABSTRACT
Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single cycle global communication. A Raw microprocessor distributes all of its resources, including instruction streams, register files, memory ports, and ALUs, over a pipelined two-dimensional mesh interconnect, and exposes them fully to the compiler. Because communication in Raw machines is distributed, compiling for instruction-level parallelism (ILP) requires both spatial instruction partitioning as well as traditional temporal instruction scheduling. In addition, the compiler must explicitly manage all communication through the interconnect, including the global synchronization required at branch points. This paper describes RAWCC, the compiler we have developed for compiling general-purpose sequential programs to the distributed Raw architecture. We present performance results that demonstrate that although Raw machines provide no mechanisms for global communication the Raw compiler can schedule to achieve speedups that scale with the number of available functional units.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Saman P. Amarasinghe , Jennifer M. Anderson , Christopher S. Wilson , Shih-Wei Liao , Brian R. Murphy , Robert S. French , Monica S. Lam , Mary W. Hall, Multiprocessors from a Software Perspective, IEEE Micro, v.16 n.3, p.52-61, June 1996
[doi> 10.1109/40.502406]
|
| |
3
|
David I. August , Wen-mei W. Hwu , Scott A. Mahlke, A framework for balancing control flow and predication, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.92-103, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
4
|
J. Babb , M. Frank , V. Lee , E. Waingold , R. Barua , M. Taylor , J. Kim , S. Devabhaktuni , A. Agarwal, The RAW benchmark suite: computation structures for general purpose computing, Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines, p.134, April 16-18, 1997
|
| |
5
|
J. Babb, R. Tessier, M. Dahl, S. Hanono, D. Hoki, and A. Agarwal. Logic emulation with virtual wires. IEEE Transactions on Computer Aided Design, 16(6):609-626, June 1997.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
| |
10
|
G. Deso}i. Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach. Technical report, HP HPL- 98-13, Feb. 1998.
|
| |
11
|
M. Dikaiakos, A. Rogers, and K. Steiglitz. A Comparison of Techniques Used for Mapping Parallel Algorithms to Message-Passing Multiprocessors. In Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing, Oct. 1994.
|
| |
12
|
|
| |
13
|
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, 7(C-30):478-490, July 1981.
|
| |
14
|
L. Gwennap. Digital 21264 Sets New Standard. Microprocessor Report, pages 11-16, Oct. 1996.
|
| |
15
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
16
|
V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0. Technical report, HP HPL-93-80, Feb. 1994.
|
 |
17
|
|
 |
18
|
|
| |
19
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
20
|
|
| |
21
|
M. D. Smith. Extending SUIF for Machine-dependent Optimizations. In Proceedings of the First SUiF Compiler Workshop, pages 14-25, Stanford, CA, Jan. 1996.
|
 |
22
|
|
| |
23
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254]
|
 |
24
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
[doi> 10.1145/193209.193217]
|
| |
25
|
|
| |
26
|
|
CITED BY 48
|
|
|
|
|
Zhi Alex Ye , Nagaraj Shenoy , Prithviraj Baneijee, A C compiler for a processor with a reconfigurable functional unit, Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays, p.95-100, February 10-11, 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mark Oskin , Justin Hensley , Diana Keen , Frederic T. Chong , Matthew Farrens , Aneet Chopra, Exploiting ILP in page-based intelligent memory, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.208-218, November 16-18, 1999, Haifa, Israel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael I. Gordon , William Thies , Michal Karczmarek , Jasper Lin , Ali S. Meli , Andrew A. Lamb , Chris Leger , Jeremy Wong , Henry Hoffmann , David Maze , Saman Amarasinghe, A stream compiler for communication-exposed architectures, ACM SIGPLAN Notices, v.37 n.10, October 2002
|
|
|
Andrei Terechko , Erwan Le Thénaff , Henk Corporaal, Cluster assignment of global values for clustered VLIW processors, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Rafael Maestre , Fadi J. Kurdahi , Milagros Fernández , Roman Hermida , Nader Bagherzadeh , Hartej Singh, A framework for reconfigurable computing: task scheduling and context management, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.9 n.6, p.858-873, 12/1/2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
|
|
|
|
|
|
Vassos Soteriou , Noel Eisley , Li-Shiuan Peh, Software-directed power-aware interconnection networks, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Modeling instruction placement on a spatial architecture, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
Hyunchul Park , Kevin Fan , Manjunath Kudlur , Scott Mahlke, Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guilherme Ottoni , Ram Rangan , Adam Stoler , David I. August, Automatic Thread Extraction with Decoupled Software Pipelining, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.105-118, November 12-16, 2005, Barcelona, Spain
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
|
|
|
Steven Swanson , Andrew Putnam , Martha Mercaldi , Ken Michelson , Andrew Petersen , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Area-Performance Trade-offs in Tiled Dataflow Architectures, ACM SIGARCH Computer Architecture News, v.34 n.2, p.314-326, May 2006
|
|
|
Bingfeng Mei , Serge Vernalde , Diederik Verkest , Hugo De Man , Rudy Lauwereins, Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling, Proceedings of the conference on Design, Automation and Test in Europe, p.10296, March 03-07, 2003
|
|
|
|
|
|
Nikhil Bansal , Sumit Gupta , Nikil Dutt , Alex Nicolau , Rajesh Gupta, Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures, Proceedings of the conference on Design, automation and test in Europe, p.10474, February 16-20, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hyunchul Park , Kevin Fan , Scott A. Mahlke , Taewook Oh , Heeseok Kim , Hong-seok Kim, Edge-centric modulo scheduling for coarse-grained reconfigurable architectures, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|