|
ABSTRACT
We describe a compilation algorithm for efficient software pipelining of general inner loops, where the number of iterations and the time taken by each iteration may be unpredictable, due to arbitrary if-then- else statements and conditional exit statements within the loop. As our target machine, we assume a wide instruction word architecture that allows multi-way branching in the form of if-then-else trees, and that allows conditional register transfers depending on where the microinstruction branches to (a hardware implementation proposal for such a machine is briefly described in the paper). Our compilation algorithm, which we call the pipeline scheduling technique, produces a software- pipelined version of a given inner loop, which allows a new iteration of the loop to begin on every cycle whenever dependencies and resources permit. The correctness and termination properties of the algorithm are studied in the paper.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Agerwala, T. and Cocke, J. (87) "High Performance Reduced Instruction Set Computers" research report no. RC 12434, IBM Thomas J. Watson Research Center, Yorktown Heights, 1987.
|
| |
2
|
Allen, R. and Kennedy, K. (84) "Automatic Translation of Fortran Programs to Vector Form" Technical report COMP TR84-9, Dept. of Computer Science, Rice University, July 1984.
|
| |
3
|
Anderson, D.W., Sparacio, F.J., and Tomasulo, F.M. (67) "The IBM System/360 Model 91: Machine Philosophy and Instruction Handling" IBM Journal of Research and Development, Vol. 11, January 1967.
|
 |
4
|
|
| |
5
|
Banerjee, U., Gajski, D, and Kuck, D. (80) "Array Machine Control Units for Loops Containing IFs" Proc. 1980 International Conference on Parallel Processing.
|
 |
6
|
John Beetem , Monty Denneau , Don Weingarten, The GF11 supercomputer, Proceedings of the 12th annual international symposium on Computer architecture, p.108-115, June 17-19, 1985, Boston, Massachusetts, United States
|
| |
7
|
Charlesworth, A.E. (81) "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family" IEEE Computer, September 1981.
|
| |
8
|
Cytron, R.G. (84) "Compile-time Scheduling and Optimization for Asynchronous Machines" Report no. UIUCDCS-R-84-1177, Dept. of Computer Science, University of Illinois at Urbana- Champaign, October 1984.
|
| |
9
|
Davies, J.R.B. (81) "Parallel Loop Constructs For Multiprocessors" Report no. UIUDCS-R-81-1070, Dept. of Computer Science, University of Illinois at Urbana-Champaign, May 1981.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Fisher, J.A. (82) "Very Long Instruction Word Architectures and the ELI-512" Research report #253, Dept. of Computer Science, Yale University, December 1982.
|
| |
14
|
Hagiwara, H., Tomita, S., Oyanagi, S., Shibayama, K. (80) "A Dynamically Microprogrammable Computer with Low-level Parallelism" IEEE Transactions on Computers, Vol C-29, no. 7, July 1980.
|
 |
15
|
|
 |
16
|
|
| |
17
|
Multiflow Computer Inc. (87), "Technical Summary" (Trace(M) series computers), Branford, Connecticut, 1987.
|
| |
18
|
Munshi, A.A., and Simons, B. (87) "Scheduling Loops on Processors: Algorithms and Complexity" Research report no. RJ 5546, IBM Thomas J. Watson Research Center, Yorktown Heights, March 1987.
|
| |
19
|
Nanodata Computer Corporation (79) "QM-1 Hardware Level User's Manual" Buffalo, New York, 1979.
|
| |
20
|
|
| |
21
|
Padua-Haiek, D.A. (79) "Multiprocessors: Discussion of Some Theoretical and Practical Problems" Report no. UIUCDCS-R-79-990, University of Illinois at Urbana-Champaign, November 1979.
|
| |
22
|
Pfister, G.F., Brantley, W.C., George, D.A., Harvey, S.L., Kleinfelder, W.J., McAuliffe, K.P., Melton, E.A., Norton, V.A., and Weiss, J. (85) "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture" Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
|
 |
23
|
|
 |
24
|
|
 |
25
|
B. Ramakrishna Rau , Christopher D. Glaeser , Raymond L. Picard, Efficient code generation for horizontal architectures: Compiler techniques and architectural support, Proceedings of the 9th annual symposium on Computer Architecture, p.131-139, April 26-29, 1982, Austin, Texas, United States
|
| |
26
|
Smith, B.J. (81) "Architecture and Applications of the HEP Multiprocessor Computer System" Real Time Signal Processing IV, Proceedings of SPIE, 1981.
|
| |
27
|
Southard, J. (84) "MACPITTS: An Approach to Silicon Compilation" Computer Magazine, December 1984.
|
 |
28
|
B. Su , S. Ding , J. Xia, URPR—An extension of URCR for software pipelining, Proceedings of the 19th annual workshop on Microprogramming, p.94-103, October 15-17, 1986, New York, New York, United States
|
| |
29
|
Tomasulo, R.M. (67) "An Efficient Algorithm for Exploiting Multiple Arithmetic Units" IBM Journal of Research and Development, vol. 11, January 1967.
|
 |
30
|
S. Tomita , K. Shibayama , T. Nakata , S. Yuasa , H. Hagiwara, A computer with low-level parallelism QA-2: its applications to 3-D graphics and Prolog/Lisp machines, Proceedings of the 13th annual international symposium on Computer architecture, p.280-289, June 02-05, 1986, Tokyo, Japan
|
 |
31
|
|
CITED BY 47
|
|
Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau, Region-based compilation: an introduction and motivation, Proceedings of the 28th annual international symposium on Microarchitecture, p.158-168, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
A. Wolfe , M. Breternitz, Jr. , C. Stephens , A. L. Ting , D. B. Kirk , R. P. Bianchini, Jr. , J. P. Shen, The white dwarf: a high-performance application-specific processor, ACM SIGARCH Computer Architecture News, v.16 n.2, p.212-222, May 1988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scott A. Mahlke , William Y. Chen , Roger A. Bringmann , Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau , Michael S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution, ACM Transactions on Computer Systems (TOCS), v.11 n.4, p.376-408, Nov. 1993
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhihong Tang , Gang Chen , Chihong Zhang , Yingwei Zhang , Bogong Su , Stanley Habib, GPMB—software pipelining branch-intensive loops, Proceedings of the 26th annual international symposium on Microarchitecture, p.21-30, December 01-03, 1993, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|