|
ABSTRACT
Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with a large number of register file ports. Efficient utilization of a clustered datapath requires careful binding/assignment of operations to clusters. The article proposes a binding algorithm that effectively explores trade-offs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for representative kernels, with up to 33% improvement over a state-of-the-art binding algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Akturan, C. and Jacome, M. F. 2002. An effective software pipelining algorithm for clustered embedded VLIW processors. J. Des. Autom. Embed. Sys., Special Issue on Design Methodologies and Tools for Real-Time Embedded Systems (to appear).
|
| |
3
|
Analog Devices. 2001. ADSP-TS001M TigerSHARC DSP product description. Available online at http://www.analog.com/products/descriptions/ADSP-TS001.html.
|
| |
4
|
Basoglu, C., Zhao, K., Kojima, K., and Kawaguchi, A. 2000. The MAP-CA VLIW-based media processor. Equator Technologies Inc. and Hitachi Ltd. Available online at http://equator.com.
|
 |
5
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
| |
6
|
Robert P. Colwell , W. Eric Hall , Chandra S. Joshi , David B. Papworth , Paul K. Rodman , James E. Tornes, Architecture and implementation of a VLIW supercomputer, Proceedings of the 1990 conference on Supercomputing, p.910-919, October 1990, New York, New York, United States
|
| |
7
|
|
| |
8
|
Desoli, G. 1998. Instruction assignment for clustered VLIW DSP compilers: A new approach. Tech. Rep. HPL-98-13, Hewlett-Packard Co., February.
|
| |
9
|
Dixit, K. 2001. Performance SPECulations---Benchmarks, friend or foe. In Procedings of the Seventh International Symposium on High Performance Computer Architecture (Monterrey, Mexico).
|
| |
10
|
|
| |
11
|
|
 |
12
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
13
|
Faraboschi, P., Desoli, G., and Fisher, J. A. 1998. Clustered instruction-level parallel processors. Tech. Rep. HPL-98-204, Hewlett-Packard Co., December.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
Silvina Hanono , Srinivas Devadas, Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator, Proceedings of the 35th annual conference on Design automation, p.510-515, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277184]
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Lapinskii, V., Jacome, M. F., and de Veciana, G. 2002. Application-specific clustered VLIW datapaths: Early exploration on a parameterized design space. IEEE Trans. Comput. Aid. Des. Integ. Circ. Syst. (accepted for publication).
|
| |
22
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
23
|
|
 |
24
|
Peter Mattson , William J. Dally , Scott Rixner , Ujval J. Kapasi , John D. Owens, Communication scheduling, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.82-92, November 2000, Cambridge, Massachusetts, United States
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
Rau, B. R., Kathail, V., and Aditya, S. 1998. Machine-description driven compilers for EPIC processors. Tech. Rep. HPL-98-40, Hewlett-Packard Co., September.
|
| |
30
|
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 1999. Register organization for media processing. In Proceedings of the 26th International Symposium on High-Performance Computer Architecture.
|
| |
31
|
|
| |
32
|
Texas Instruments. 2000. TMS320C6000 CPU and instruction set reference guide. Literature Number: SPRU226.
|
| |
33
|
|
|