|
ABSTRACT
Convergent scheduling is a general framework for cluster assignment and instruction scheduling on spatial architectures. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or constraint. The passes share a simple, common interface that provides spatial and temporal preference for each instruction. Preferences are not absolute; instead, the interface allows a pass to express the confidence of its preferences, as well as preferences for multiple space and time slots. A pass operates by modifying these preferences. By applying a series of passes that address all the relevant constraints, the convergent scheduler can produce a schedule that satisfies all the important constraints. Because all passes are independent and need to understand only one interface to interact with each other, convergent scheduling simplifies the problem of handling multiple constraints and codeveloping different heuristics. We have applied convergent scheduling to two spatial architectures: the Raw processor and a clustered VLIW machine. It is able to successfully handle traditional constraints such as parallelism, load balancing, and communication minimization, as well as constraints due to preplaced instructions, which are instructions with predetermined cluster assignment. Convergent scheduling is able to obtain an average performance improvement of 21% over the existing space-time scheduler of the Raw processor, and an improvement of 14% over state-of-the-art assignment and scheduling techniques on a clustered VLIW architecture.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Babb , M. Frank , V. Lee , E. Waingold , R. Barua , M. Taylor , J. Kim , S. Devabhaktuni , A. Agarwal, The RAW benchmark suite: computation structures for general purpose computing, Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines, p.134, April 16-18, 1997
|
 |
2
|
Rajeev Barua , Walter Lee , Saman Amarasinghe , Anant Agarwal, Maps: a compiler-managed memory system for raw machines, Proceedings of the 26th annual international symposium on Computer architecture, p.4-15, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
3
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
 |
4
|
Keith D. Cooper , Philip J. Schielke , Devika Subramanian, Optimizing for reduced code space using genetic algorithms, Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, p.1-9, May 05-05, 1999, Atlanta, Georgia, United States
|
| |
5
|
G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: a New Approach. Technical Report HPL-98-13, Hewlett Packard Laboratories, 1998.
|
| |
6
|
|
| |
7
|
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.
|
 |
8
|
|
| |
9
|
|
| |
10
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
 |
15
|
|
| |
16
|
|
| |
17
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
18
|
Machsuif. http://www.eecs.harvard.edu/hube.
|
 |
19
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
20
|
D. Maze. Compilation Infrastructure for VLIW Machines. Master's thesis, Massachusetts Institute of Technology, September 2001.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
CITED BY 15
|
|
|
|
|
|
|
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
Hyunchul Park , Kevin Fan , Manjunath Kudlur , Scott Mahlke, Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
Kevin Fan , Hyun hul Park , Manjunath Kudlur , S ott Mahlke, Modulo scheduling for highly customized datapaths to increase hardware reusability, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Hyunchul Park , Kevin Fan , Scott A. Mahlke , Taewook Oh , Heeseok Kim , Hong-seok Kim, Edge-centric modulo scheduling for coarse-grained reconfigurable architectures, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|