|
ABSTRACT
As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop control is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally; hardware control limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
A. Aiken and A. Nicolau, "A realistic resource-constrained software pipelining algorithm," in Advances in Languages and Compilers for Parallel Processing (A. Nicolau, D. Galernter, T. Gross, and D. Padua, eds.), pp. 274-290, London: Pitman/The MIT Press, 1991.
|
| |
4
|
|
 |
5
|
|
| |
6
|
B. R. Rau, "Iterative modulo scheduling," International Journal of Parallel Processing, vol. 24, pp. 3-64, February 1996.
|
 |
7
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
[doi> 10.1145/70082.68185]
|
| |
8
|
Intel Corporation, Intel IA-64 Architecture Software Developer's Manual Volume 1: Application Architecture. Jan 2000.
|
 |
9
|
Gang-Ryung Uh , Yuhong Wang , David Whalley , Sanjay Jinturkar , Chris Burns , Vincent Cao, Effective exploitation of a zero overhead loop buffer, Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, p.10-19, May 05-05, 1999, Atlanta, Georgia, United States
[doi> 10.1145/314403.314419]
|
| |
10
|
W. W. Hwu and M. C. Merten, Method and Apparatus for Modulo Scheduled Loop Execution in a Processor Architecture. United States Patent Application, IMPACT Technologies, Inc., December 1999.
|
 |
11
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
[doi> 10.1145/144953.145795]
|
| |
12
|
Texas Instruments, "TMS320C6000 CPU and instruction set reference guide," Tech. Rep. SPRU169D, Texas, March 1999.
|
| |
13
|
|
| |
14
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
15
|
|
| |
16
|
|
| |
17
|
W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August, "Compiler technology for future microprocessors," Proc. of the IEEE, vol. 83, pp. 1625-1995, December 1995.
|
| |
18
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
19
|
ETSI TC-SMG, "Digital cellular communications system; enhanced full rate (EFR) speech transcoding (GSM 06.60)," Tech. Rep. ETS 300 726, European Telecomm. Standards Institute, Mar. 1997.
|
 |
20
|
David I. August , Daniel A. Connors , Scott A. Mahlke , John W. Sias , Kevin M. Crozier , Ben-Chung Cheng , Patrick R. Eaton , Qudus B. Olaniran , Wen-mei W. Hwu, Integrated predicated and speculative execution in the IMPACT EPIC architecture, Proceedings of the 25th annual international symposium on Computer architecture, p.227-237, June 27-July 02, 1998, Barcelona, Spain
[doi> 10.1145/279358.279391]
|
|