ACM Home Page
Please provide us with feedback. Feedback
Communication scheduling
Full text PdfPdf (1.12 MB)
Source ACM SIGPLAN Notices archive
Volume 35 ,  Issue 11  (November 2000) table of contents
Pages: 82 - 92  
Year of Publication: 2000
ISSN:0362-1340
Authors
Peter Mattson  Computer Systems Laboratory, Stanford University, Stanford, CA
William J. Dally  Computer Systems Laboratory, Stanford University, Stanford, CA
Scott Rixner  Computer Systems Laboratory, Stanford University, Stanford, CA
Ujval J. Kapasi  Computer Systems Laboratory, Stanford University, Stanford, CA
John D. Owens  Computer Systems Laboratory, Stanford University, Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 16,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/356989.356997
What is a DOI?

ABSTRACT

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables scheduling to these emerging architectures, including those that use shared buses and register file ports. Scheduling to these shared interconnect architectures is difficult because it requires simultaneously allocating functional units to operations and buses and register file ports to the communications between operations. Prior VLIW scheduling algorithms are limited to clustered register file architectures with no shared buses or register file ports. Communication scheduling extends the range of target architectures by making each communication explicit and decomposing it into three components: a write stub, zero or more copy operations, and a read stub. Communication scheduling allows media processing kernels to achieve 98% of the performance of a central register file architecture on a distributed register file architecture with only 9% of the area, 6% of the power consumption, and 37% of the access delay, and 120% of the performance of a clustered register file architecture on a distributed register file architecture with 56% of the area and 50% of the power consumption.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
Desoli, G. "Instruction assignment for clustered VLIW DSP compilers: A new approach." Technical Report HPL- 98-13, Hewlett-Packard Laboratories, Feb., 1998.
 
5
 
6
 
7
 
8
Grossman, J. and Dally, W. "Point sample rendering." Proceedings of the 9th Eurographics Workshop on Rendering, June, 1998, pp. 181-192.
9
 
10
11
 
12
 
13
14
 
15
Rixner, S., Dally, W. J., Khailany, B., Mattson, E, Kapasi, U. J., and Owens, J. D. "Register organization for media processing", 6th International Symposium on High-Performance Computer Architecture, Jan., 2000, pp. 375-386.
 
16
17


Collaborative Colleagues:
Peter Mattson: colleagues
William J. Dally: colleagues
Scott Rixner: colleagues
Ujval J. Kapasi: colleagues
John D. Owens: colleagues