|
ABSTRACT
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Aiken, A., and Nicolau, A. A realistic resource-constrained software pipelining algorithm. In Advances in Languages and Compilers for Parallel Processing, Nicolau, A., Gelernter, D., Gross, T., and Padua, D., (Editor). Pitman/The MIT Press, London, 1991, 274-290.
|
 |
4
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
| |
5
|
|
| |
6
|
Berry, M., Chen, D., Kuck, D., Lo, S., Pang, Y., Pointer, L., Roloff, R., Samah, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, L., Orszag, S., Seidl, F., Johnson, O., Goodrum, R., and Martin, J. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The international Journal of Supercomputer Applications 3, 3 (Fall 1989), 5-40.
|
 |
7
|
|
| |
8
|
Charlesworth, A.E. An approach to scientific array processing: the architectural design of the AP-120B/FPS- 164 Family. Computer 14, 9 (1981), 18-27.
|
| |
9
|
Davidson, E.S., Shar, L.E., Thomas, A.T., and PateI, J.H. Effective control for pipelined computers. In Proc. COMPCON '90, (San Francisco, February 1975), 181-184.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
Fisher, J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers C~30, 7 (July 1981), 478-490.
|
| |
14
|
Fisher, J.A., Landskov, D., and Shriver, B.D. Microcode compaction: looking backward and looking forward. In Proc. 1981 National Computer Conference, (1981), 95- 102.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Hu, T.C. Parallel sequencing and assembly line problems Operations Research 9, 6 (1961), 841-848.
|
 |
18
|
|
| |
19
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
20
|
|
 |
21
|
|
| |
22
|
Lawler, E.L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.
|
| |
23
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
 |
24
|
Scott A. Mahlke , William Y. Chen , Roger A. Bringmann , Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau , Michael S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution, ACM Transactions on Computer Systems (TOCS), v.11 n.4, p.376-408, Nov. 1993
[doi> 10.1145/161541.159765]
|
 |
25
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
26
|
Mateti, P., and Deo, N. On algorithms for enumerating all circuits of a graph. SIAM Journal of Computing 5, 1 (1976), 90-99.
|
| |
27
|
McMahon, F.H. The Livermore Fortran kernels: a computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Laboratory. Livermore, California, 1986.
|
 |
28
|
|
| |
29
|
Park, J.C.H., and Schlansker, M.S. On predicated execution. Technical Report HPL-91-58. Hewlett Packard Laboratories, 1991.
|
| |
30
|
Ramakrishnan, S. Software pipelining in PA-RiSC compilers. Hewlett-Packard Journal, (July 1992), 39-45.
|
| |
31
|
Ramamoorthy, C.V., Chandy, K.M., and Gonzalez, M.J. Optimal scheduling strategies in a multiprocessor system. IEEE Transactions on Computers C-21, 2 (February 1972), 137-146.
|
| |
32
|
|
| |
33
|
Rau, B.R. Iterative Modulo Scheduling. HPL Technical Report. Hewlett-Packard Laboratories, 1994.
|
 |
34
|
|
 |
35
|
B. R. Rau , M. Lee , P. P. Tirumalai , M. S. Schlansker, Register allocation for software pipelined loops, Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, p.283-299, June 15-19, 1992, San Francisco, California, United States
|
 |
36
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
|
| |
37
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
38
|
|
 |
39
|
|
 |
40
|
|
| |
41
|
|
 |
42
|
Mario Tokoro , Takashi Takizuka , Eiji Tamura , Ichiro Yamaura, A technique of global optimization of microprograms, Proceedings of the 11th annual workshop on Microprogramming, p.41-50, November 19-22, 1978, Pacific Grove, California, United States
|
| |
43
|
Uniejewski, J. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall 1989).
|
| |
44
|
|
| |
45
|
Warter, N.J., Lavery, D.M., and Hwu, W.W. The benefit of predicated execution for software pipelining. In Proc. 26th Annual Hawaii international Conference on System Sciences, (Hawaii, 1993).
|
 |
46
|
Nancy J. Warter , Scott A. Mahlke , Wen-Mei W. Hwu , B. Ramakrishna Rau, Reverse If-Conversion, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.290-299, June 21-25, 1993, Albuquerque, New Mexico, United States
|
CITED BY 135
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David López , Mateo Valero , Josep Llosa , Eduard Ayguadé, Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs, Proceedings of the 11th international conference on Supercomputing, p.12-19, July 07-11, 1997, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Javier Zalamea , Josep Llosa , Eduard Ayguadé , Mateo Valero, Two-level hierarchical register file organization for VLIW processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.137-146, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shail Aditya , Michael S. Schlansker, ShiftQ: a bufferred interconnect for custom loop accelerators, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
Dingchao Li , Yuji Iwahori , Naohiro Ishii, A recursive time estimation algorithm for program traces under resource constraints, Proceedings of the 1998 ACM symposium on Applied Computing, p.635-640, February 27-March 01, 1998, Atlanta, Georgia, United States
|
|
|
Hongbo Yang , Guang R. Gao , Clement Leung, On achieving balanced power consumption in software pipelined loops, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
|
|
|
|
|
|
|
|
|
Josep Llosa , Mateo Valero , Eduard Ayguadé , Antonio González, Hypernode reduction modulo scheduling, Proceedings of the 28th annual international symposium on Microarchitecture, p.350-360, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vicki H. Allan , U. R. Shah , K. M. Reddy, Petri net versus modulo scheduling for software pipelining, Proceedings of the 28th annual international symposium on Microarchitecture, p.105-110, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
David López , Josep Llosa , Mateo Valero , Eduard Ayguadé, Widening resources: a cost-effective technique for aggressive ILP architectures, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.237-246, November 1998, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau, Region-based compilation: an introduction and motivation, Proceedings of the 28th annual international symposium on Microarchitecture, p.158-168, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jay Bharadwaj , William Y. Chen , Weihaw Chuang , Gerolf Hoflehner , Kishore Menezes , Kalyan Muthukumar , Jim Pierce, The Intel IA-64 Compiler Code Generator, IEEE Micro, v.20 n.5, p.44-53, September 2000
|
|
|
|
|
|
|
|
|
Gang-Ryung Uh , Yuhong Wang , David Whalley , Sanjay Jinturkar , Yunheung Paek , Vincent Cao , Chris Burns, Compiler transformations for effectively exploiting a zero overhead loop buffer, Software—Practice & Experience, v.35 n.4, p.393-412, 10 April 2005
|
|
|
|
|
|
Long Li , Bo Huang , Jinquan Dai , Luddy Harrison, Automatic multithreading and multiprocessing of C programs for IXP, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongbo Rong , Zhizhong Tang , R. Govindarajan , Alban Douillet , Guang R. Gao, Single-Dimension Software Pipelining for Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.163, March 20-24, 2004, Palo Alto, California
|
|
|
Manjunath Kudlur , Kevin Fan , Michael Chu , Rajiv Ravindran , Nathan Clark , Scott Mahlke, FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.201, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
Hyunchul Park , Kevin Fan , Manjunath Kudlur , Scott Mahlke, Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hugo Venturini , Frederic Riss , Jean-Claude Fernandez , Miguel Santana, Non-transparent debugging for software-pipelined loops, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Fan , Manjunath Kudlur , Hyunchul Park , Scott Mahlke, Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.219-232, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuan Lin , Manjunath Kudlur , Scott Mahlke , Trevor Mudge, Hierarchical coarse-grained stream compilation for software defined radio, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Kevin Fan , Hyun hul Park , Manjunath Kudlur , S ott Mahlke, Modulo scheduling for highly customized datapaths to increase hardware reusability, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Ganesh Dasika , Shidhartha Das , Kevin Fan , Scott Mahlke , David Bull, DVFS in loop accelerators using BLADES, Proceedings of the 45th annual conference on Design automation, June 08-13, 2008, Anaheim, California
|
|
|
Mohammed Fellahi , Albert Cohen , Sid Touati, Code-size conscious pipelining of imperfectly nested loops, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.49-55, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
Chris Zimmer , Stephen Roderick Hines , Prasad Kulkarni , Gary Tyson , David Whalley, Facilitating compiler optimizations through the dynamic mapping of alternate register structures, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Hyunchul Park , Kevin Fan , Scott A. Mahlke , Taewook Oh , Heeseok Kim , Hong-seok Kim, Edge-centric modulo scheduling for coarse-grained reconfigurable architectures, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Qingfeng Zhuge , Chun Jason Xue , Meikang Qiu , Jingtong Hu , Edwin H. -M. Sha, Timing optimization via nest-loop pipelining considering code size, Microprocessors & Microsystems, v.32 n.7, p.351-363, October, 2008
|
|
|
|
|
|
|
|
|
|
|
|
Gregory Dimitroulakos , Michalis D. Galanis , Costas E. Goutis, Optimized mapping for enchancing the operation parallelism in coarse-grained reconfigurable arrays, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, p.660-665, September 22-24, 2006, Lisbon, Portugal
|
|
|
|
|
|
Stephen Friedman , Allan Carroll , Brian Van Essen , Benjamin Ylvisaker , Carl Ebeling , Scott Hauck, SPR: an architecture-adaptive CGRA mapping tool, Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, February 22-24, 2009, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|