|
ABSTRACT
Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around.In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
M. Berry, D. Chen, P. Koss, and D. Kuck. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Technical Report 827, Center for Supercomputing Research and Development, November 1988.
|
 |
3
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
 |
4
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
| |
5
|
A. Charlesworth. An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family. Computer, 14(9):18-27, 1981.
|
| |
6
|
|
| |
7
|
|
| |
8
|
G. Desoli. Instruction assignment for clustered VLIW DSP compilers: A new approach. Technical Report HPL-98-13, HP Laboratories, January 1998.
|
| |
9
|
|
| |
10
|
|
 |
11
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
P. N. Glaskowsky. MAP1000 unfolds at Equator. Microporcessor Report., 12(16), December 1998.
|
 |
16
|
|
| |
17
|
T. I. Inc. TMS320C62x/67x CPU and Instruction Set Reference Guide. 1998.
|
| |
18
|
S. Jang, S. Carr, P. Sweany, and D. Kuras. A code geration framework for VLIW architectures with partitioned register banks. In Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Josep Llosa , Mateo Valero , Eduard Ayguadé , Antonio González, Hypernode reduction modulo scheduling, Proceedings of the 28th annual international symposium on Microarchitecture, p.350-360, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
23
|
|
| |
24
|
|
| |
25
|
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39-45, July 1992.
|
 |
26
|
|
 |
27
|
B. R. Rau , M. Lee , P. P. Tirumalai , M. S. Schlansker, Register allocation for software pipelined loops, Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, p.283-299, June 15-19, 1992, San Francisco, California, United States
|
 |
28
|
|
| |
29
|
S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register organization for media processing. In Proc., 6th High-Performance Computer Architecture (HPCA-6), pages 375-386, January 2000.
|
| |
30
|
|
| |
31
|
|
 |
32
|
Javier Zalamea , Josep Llosa , Eduard Ayguadé , Mateo Valero, Improved spill code generation for software pipelined loops, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.134-144, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
33
|
J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. MIRS: Modulo scheduling with integrated register spilling. In Proc. of 14th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC2001), August 2001.
|
CITED BY 12
|
|
Andrei Terechko , Erwan Le Thénaff , Henk Corporaal, Cluster assignment of global values for clustered VLIW processors, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|