| A scalable wide-issue clustered VLIW with a reconfigurable interconnect |
| Full text |
Pdf
(365 KB)
|
| Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
table of contents
San Jose, California, USA
SESSION: Microprocessor architecture
table of contents
Pages: 148 - 158
Year of Publication: 2003
ISBN:1-58113-676-5
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 40, Citation Count: 2
|
|
|
ABSTRACT
Clustered VLIW architectures have been widely adopted in modern embedded multimedia applications for their ability to exploit high degrees of ILP with reasonable trade-off in complexity and silicon costs. Studies have however shown limited performance scaling for wide-issue machines. In this paper we describe the architecture of a clustered VLIW with a runtime reconfigurable inter-cluster bus suitable to address such scalability problem. The architecture is aimed at kernel loops acceleration through a coprocessor approach and allows a customization of the interconnect between neighboring register files before each loop execution. We have adopted an inter-cluster communication mechanism based on a constant-complexity interconnect. The complexity and latency independent of the number of clusters preserve the scalability on issue-width. To handle the limited connectivity, the interconnection resources in the inter-cluster bus are exposed to the compiler, and scheduled like other resources with an adapted version of modulo scheduling. Other relevant features include the capability to define shifting queues in the register files, for a more effective software pipelining support. The addition of a limited amount of reconfigurability to the well established VLIW programming model results in low-overhead inter-cluster communications and a scalable ILP architecture. Simulation results show that we can achieve near linear scalability for certain classes of kernel loops.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Dasu, W. Panchanathan, "Survey of Media Processing Approaches," IEEE Tr. on Circuits and Systems for Video Technology, v.12, no.8, pp. 633--645, Aug. 2002.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
B. Dupont de Dinechin , F. de Ferri , C. Guillon , A. Stoutchinin, Code generator optimizations for the ST120 DSP-MCU core, Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, p.93-102, November 17-19, 2000, San Jose, California, United States
[doi> 10.1145/354880.354894]
|
| |
7
|
C. Basoglu, W. Lee, J. O'Donnell, "The Equator MAP-CA DSP: An End-To-End Broadband Signal Processor VLIW," IEEE Tr. on Circuits and Systems for Video Technology, v.12 no.8, pp. 646--659, Aug. 2002.
|
| |
8
|
P. Faraboschi, G. Desoli, J. Fisher, "Clustered Instruction-Level Parallel Processors," Tech. Report HPL-98-204, Hewlett-Packard, Dec. 1998.
|
| |
9
|
S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, J. Owens, "Register Organization for Media Processing," HPCA6, 2000.
|
 |
10
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Hartej Singh , Ming-Hau Lee , Guangming Lu , Nader Bagherzadeh , Fadi J. Kurdahi , Eliseu M. Chaves Filho, MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications, IEEE Transactions on Computers, v.49 n.5, p.465-481, May 2000
[doi> 10.1109/12.859540]
|
 |
19
|
Seth Copen Goldstein , Herman Schmit , Matthew Moe , Mihai Budiu , Srihari Cadambi , R. Reed Taylor , Ronald Laufer, PipeRench: a co/processor for streaming multimedia acceleration, Proceedings of the 26th annual international symposium on Computer architecture, p.28-39, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
20
|
Zhi Alex Ye , Andreas Moshovos , Scott Hauck , Prithviraj Banerjee, CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit, Proceedings of the 27th annual international symposium on Computer architecture, p.225-235, June 2000, Vancouver, British Columbia, Canada
|
| |
21
|
|
| |
22
|
|
| |
23
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
24
|
|
 |
25
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
|
| |
26
|
D. Rizzo and O. Colavin, "A Runtime Reconfigurable Clustered VLIW Architecture for Mediaprocessing", to appear, Proceedings of the ESTIMedia Workshop, 2003.
|
 |
27
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
28
|
|
| |
29
|
|
|