| Energy efficiency vs. programmability trade-off: architectures and design principles |
| Full text |
Pdf
(133 KB)
|
| Source
|
Design, Automation, and Test in Europe
archive
Proceedings of the conference on Design, automation and test in Europe: Proceedings
table of contents
Munich, Germany
SESSION: Hot topic: architectures and NoC (4G wireles special day)
table of contents
Pages: 587 - 592
Year of Publication: 2006
ISBN:3-9810801-0-6
|
|
Authors
|
|
J. P. Robelly
|
Dresden Silicon GmbH., Helmholtzstrasse, Dresden, Germany
|
|
H. Seidel
|
Dresden Silicon GmbH., Helmholtzstrasse, Dresden, Germany
|
|
K. C. Chen
|
Dresden Silicon GmbH., Helmholtzstrasse, Dresden, Germany
|
|
G. Fettweis
|
Dresden Silicon GmbH., Helmholtzstrasse, Dresden, Germany
|
|
| Sponsors |
|
| Publisher |
European Design and Automation Association
3001 Leuven, Belgium, Belgium
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 19, Citation Count: 0
|
|
|
ABSTRACT
Performance achievements on programmable architectures due to process technology are reaching their limits, since designs are becoming wire- and power-limited rather than device limited. Likewise, traditional exploitation of instruction level parallelism saturates as the conventional approach for designing wider issue machines leads to very expensive interconnections, big instruction memory footprint and high register file pressure. New architectural concepts targeted to the application domain of media processing are needed in order to push current state-of-the-art limitations. To this end, we regard media applications as a collection of tasks which consume and produce chunks of data. The exploitation of task level parallelism as well as more traditional forms of parallelism is a key issue for achieving the required amount of MOPS/Watt and MOPS/mm2 for media applications. Tasks comprise data transfers and number crunching algorithm kernels, which are very computing-intensive yet highly predictable. Moreover, most of the data manipulated by a task is of a local nature. Granularity and characteristics of these tasks will lead us in this paper to draw conclusions about memory hierarchy, task scheduling strategies and efficient low-overhead programmable architectures for highly predictable kernel computations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AnalogDevices. Writing efficient floating-point ffts for adspts201 tigersharc processors. March 2004.
|
| |
2
|
BDTI. A bdti analysis of the texas instruments tms320c67x. available from www.bdti.com/products/, 2003.
|
| |
3
|
|
| |
4
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
5
|
M. H. M. and W. Dally. How scaling will change processor architecture. In Proc. of ISSCC 2004.
|
| |
6
|
J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi. Video coding with h.264/avc: Tools, performance and complexity. IEEE Circuits and Systems Magazine, pages 7--27, January 2004.
|
| |
7
|
R. Schaefer, T. Wiegand, and H. Schwarz. The emerging h.264/avc standard. EBU Technical Review, (293), January 2005.
|
| |
8
|
TexasInstruments. C55x dsp benchmarks. available from http://dspvillage.ti.com/.
|
| |
9
|
|
|