| Communication scheduling |
| Full text |
Pdf
(1.12 MB)
|
| Source
|
ACM SIGPLAN Notices
archive
Volume 35 , Issue 11 (November 2000)
table of contents
Pages: 82 - 92
Year of Publication: 2000
ISSN:0362-1340
|
|
Authors
|
|
Peter Mattson
|
Computer Systems Laboratory, Stanford University, Stanford, CA
|
|
William J. Dally
|
Computer Systems Laboratory, Stanford University, Stanford, CA
|
|
Scott Rixner
|
Computer Systems Laboratory, Stanford University, Stanford, CA
|
|
Ujval J. Kapasi
|
Computer Systems Laboratory, Stanford University, Stanford, CA
|
|
John D. Owens
|
Computer Systems Laboratory, Stanford University, Stanford, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 16, Citation Count: 1
|
|
|
ABSTRACT
The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables scheduling to these emerging architectures, including those that use shared buses and register file ports. Scheduling to these shared interconnect architectures is difficult because it requires simultaneously allocating functional units to operations and buses and register file ports to the communications between operations. Prior VLIW scheduling algorithms are limited to clustered register file architectures with no shared buses or register file ports. Communication scheduling extends the range of target architectures by making each communication explicit and decomposing it into three components: a write stub, zero or more copy operations, and a read stub. Communication scheduling allows media processing kernels to achieve 98% of the performance of a central register file architecture on a distributed register file architecture with only 9% of the area, 6% of the power consumption, and 37% of the access delay, and 120% of the performance of a clustered register file architecture on a distributed register file architecture with 56% of the area and 50% of the power consumption.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
| |
2
|
Robert P. Colwell , W. Eric Hall , Chandra S. Joshi , David B. Papworth , Paul K. Rodman , James E. Tornes, Architecture and implementation of a VLIW supercomputer, Proceedings of the 1990 conference on Supercomputing, p.910-919, October 1990, New York, New York, United States
|
| |
3
|
|
| |
4
|
Desoli, G. "Instruction assignment for clustered VLIW DSP compilers: A new approach." Technical Report HPL- 98-13, Hewlett-Packard Laboratories, Feb., 1998.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Grossman, J. and Dally, W. "Point sample rendering." Proceedings of the 9th Eurographics Workshop on Rendering, June, 1998, pp. 181-192.
|
 |
9
|
|
| |
10
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
 |
11
|
William Mangione-Smith , Santosh G. Abraham , Edward S. Davidson, Register requirements of pipelined processors, Proceedings of the 6th international conference on Supercomputing, p.260-271, July 19-24, 1992, Washington, D. C., United States
[doi> 10.1145/143369.143419]
|
| |
12
|
|
| |
13
|
|
 |
14
|
B. Ramakrishna Rau , Christopher D. Glaeser , Raymond L. Picard, Efficient code generation for horizontal architectures: Compiler techniques and architectural support, Proceedings of the 9th annual symposium on Computer Architecture, p.131-139, April 26-29, 1982, Austin, Texas, United States
|
| |
15
|
Rixner, S., Dally, W. J., Khailany, B., Mattson, E, Kapasi, U. J., and Owens, J. D. "Register organization for media processing", 6th International Symposium on High-Performance Computer Architecture, Jan., 2000, pp. 375-386.
|
| |
16
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Brucek Khailany , Abelardo López-Lagunas , Peter R. Mattson , John D. Owens, A bandwidth-efficient architecture for media processing, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.3-13, November 1998, Dallas, Texas, United States
|
 |
17
|
Eric Stotzer , Ernst Leiss, Modulo scheduling for the TMS320C6x VLIW DSP architecture, Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, p.28-34, May 05-05, 1999, Atlanta, Georgia, United States
|
|