|
ABSTRACT
Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism.Let {ABC}n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {ABC}n is equivalent to A{BCA}n−1BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop.Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
AIKEN, A. ANB NICOLAU, A. 1990 A realistic resource-constrained software pipehnmg algorithm Tech. Rep. RJ 7743, IBM Research Division, San Jose, CA, October
|
| |
7
|
|
| |
8
|
|
| |
9
|
BANERJEE, U., SHEN, S., KUCK, D J., AND TOWLE, R. A 1979 Time and parallel precentor bounds for Fortran-like loops. IEEE Trans Comput C-28, 9 (Sept), 660 670
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
DAVIDSON, E. S. 1971. The design and control of pipelined function generators. In Proceedings of the 1971 Internatmnal IEEE Conference on Systems, Networks, and Computers' (Oaxtepec, Mexico, Jan.), 19-21.
|
 |
15
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
FISHER, J. 1981. Trace scheduling: A technique for global microcode compaction. {EEE Trans. Comput. C-30, 7 (July), 478 490.
|
 |
21
|
|
 |
22
|
Guang R. Gao , Yue-Bong Wong , Qi Ning, A timed Petri-net model for fine-grain loop scheduling, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.204-218, June 24-28, 1991, Toronto, Ontario, Canada
|
| |
23
|
GAO, G. R., WONO, W.-B., AND NING, Q. 1991b. A timed Petri-net model for fine-grain loop scheduling. Tech. Rep. ACAPS Technical Memo 18, School of Computer Science, McGill University, Montreal, Canada, H3A 2A7, January.
|
 |
24
|
R. Govindarajan , Erik R. Altman , Guang R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proceedings of the 27th annual international symposium on Microarchitecture, p.85-94, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192733]
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
JONES, R.B. 1991. Constrained software pipelining. Utah State University, Dept. of Computer Science, Logan, UT, Master's Thesis, Sept.
|
| |
29
|
|
| |
30
|
KUCK, D. J. AND PADUA, D.A. 1979. High-speed muir(processors and their compilers. In Proceedings of the 1979 International Conference on Parallel Processing, 5-16.
|
 |
31
|
|
| |
32
|
|
 |
33
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
34
|
MATETI, P. ANU DEO, N. 1976. On algorithms for enumerating all circuits of a graph. SlAM J. Comput. 5, 1, 990-999.
|
| |
35
|
|
 |
36
|
|
| |
37
|
|
| |
38
|
O'NEILL, M. R. 1994. Software pipelining with stochastic search algorithms. Utah State University, Dept. of Computer Science, Logan, UT, Master's Thesis.
|
| |
39
|
|
 |
40
|
|
| |
41
|
|
 |
42
|
B. R. Rau , M. Lee , P. P. Tirumalai , M. S. Schlansker, Register allocation for software pipelined loops, Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, p.283-299, June 15-19, 1992, San Francisco, California, United States
|
 |
43
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
|
| |
44
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
 |
45
|
|
 |
46
|
B. R. Rau , C. D. Glaeser , E. M. Greenawalt, Architectural support for the efficient generation of code for horizontal architectures, Proceedings of the first international symposium on Architectural support for programming languages and operating systems, p.96-99, March 01-03, 1982, Palo Alto, California, United States
|
| |
47
|
|
 |
48
|
Bogong Su , Shiyuan Ding , Jian Wang , Jinshi Xia, GURPR—a method for global software pipelining, Proceedings of the 20th annual workshop on Microprogramming, p.88-96, December 01-04, 1987, Colorado Springs, Colorado, United States
[doi> 10.1145/255305.255322]
|
 |
49
|
B. Su , S. Ding , J. Xia, URPR—An extension of URCR for software pipelining, Proceedings of the 19th annual workshop on Microprogramming, p.94-103, October 15-17, 1986, New York, New York, United States
|
| |
50
|
TAR JAN, R. 1972. Depth-first search and linear graph algorithms. SIAM d. Comput. 1, 2 (June), 146-160
|
 |
51
|
|
| |
52
|
|
 |
53
|
Mario Tokoro , Eiji Tamura , Kazuhiko Takase , Kiichiro Tamaru, An approach to microprogram optimization considering resource occupancy and instruction formats, Proceedings of the 10th annual workshop on Microprogramming, p.92-108, October 05-07, 1977, Niagara Falls, New York, United States
|
 |
54
|
|
| |
55
|
|
 |
56
|
Nancy J. Warter , Grant E. Haab , Krishna Subramanian , John W. Bockhaus, Enhanced modulo scheduling for loops with conditional branches, Proceedings of the 25th annual international symposium on Microarchitecture, p.170-179, December 01-04, 1992, Portland, Oregon, United States
|
 |
57
|
Nancy J. Warter , Scott A. Mahlke , Wen-Mei W. Hwu , B. Ramakrishna Rau, Reverse If-Conversion, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.290-299, June 21-25, 1993, Albuquerque, New Mexico, United States
|
| |
58
|
WOLFE, M.J. 1990. Tzny--A Loop Restructuring Tool, User Manual Oregon Graduate Institute of Science and Technology, Beaverton, OR.
|
| |
59
|
|
 |
60
|
|
| |
61
|
|
 |
62
|
|
CITED BY 63
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alex Jones , Debabrata Bagchi , Satrajit Pal , Xiaoyong Tang , Alok Choudhary , Prith Banerjee, PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vicki H. Allan , U. R. Shah , K. M. Reddy, Petri net versus modulo scheduling for software pipelining, Proceedings of the 28th annual international symposium on Microarchitecture, p.105-110, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongbo Rong , Alban Douillet , R. Govindarajan , Guang R. Gao, Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.175, March 20-24, 2004, Palo Alto, California
|
|
|
Hongbo Rong , Zhizhong Tang , R. Govindarajan , Alban Douillet , Guang R. Gao, Single-Dimension Software Pipelining for Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.163, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Min Li , Bruno Bougard , Weiyu Xu , David Novo , Liesbet Van Der Perre , Francky Catthoor, Optimizing near-ML MIMO detector for SDR baseband on parallel programmable architectures, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Min Li , David Novo , Bruno Bougard , Liesbet Van Der Perre , Francky Catthoor, Generic multi-phase software-pipelined Partial-FFT on instruction-level-parallel architectures and SDR baseband applications, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Min Li , David Novo , Bruno Bougard , Trevor Carlson , Liesbet Van Der Perre , Francky Catthoor, Generic multiphase software pipelined partial FFT on instruction level parallel architectures, IEEE Transactions on Signal Processing, v.57 n.4, p.1604-1615, April 2009
|
|