| A loop accelerator for low power embedded VLIW processors |
| Full text |
Pdf
(221 KB)
|
Source
|
International Conference on Hardware Software Codesign
archive
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
table of contents
Stockholm, Sweden
SESSION: New design techniques for application specific processors
table of contents
Pages: 6 - 11
Year of Publication: 2004
ISBN:1-58113- 937-3
|
|
Authors
|
|
Binu Mathew
|
University of Utah, Salt Late City, UT
|
|
Al Davis
|
University of Utah, Salt Late City, UT
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 34, Citation Count: 7
|
|
|
ABSTRACT
The high transistor density afforded by modern VLSI processes have enabled the design of embedded processors that use clustered execution units to deliver high levels of performance. However, delivering data to the execution resources in a timely manner remains a major problem that limits ILP. It is particularly significant for embedded systems where memory and power budgets are limited. A distributed address generation and loop acceleration architecture for VLIW processors is presented. This decentralized on-chip memory architecture uses multiple SRAMs to provide high intra-processor bandwidth. Each SRAM has an associated stream address generator capable of implementing a variety of addressing modes in conjunction with a shared loop accelerator.The architecture is extremely useful for generating application specific embedded processors, particularly for processing input data which is organized as a stream. The idea is evaluated in the context of a fine grain VLIW architecture executing complex perception algorithms such as speech and visual feature recognition. Transistor level Spice simulations are used to demonstrate a 159x improvement in the energy delay product when compared to conventional architectures executing the same applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad memory : A design alternative for cache on-chip memory in embedded systems, 2002.
|
| |
2
|
Silviu Ciricescu , Ray Essick , Brian Lucas , Phil May , Kent Moat , Jim Norris , Michael Schuette , Ali Saidi, The Reconfigurable Streaming Vector Processor (RSVPTM), Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.141, December 03-05, 2003
|
| |
3
|
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1277--1284, September 1996.
|
| |
4
|
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2001.
|
| |
9
|
|
CITED BY 7
|
|
Binu Mathew , Al Davis , Mike Parker, A low power architecture for embedded perception, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
A. Milidonis , N. Alachiotis , V. Porpodas , H. Michail , A. P. Kakarountas , C. E. Goutis, Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
Kevin Fan , Hyun hul Park , Manjunath Kudlur , S ott Mahlke, Modulo scheduling for highly customized datapaths to increase hardware reusability, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Ittetsu Taniguchi , Murali Jayapala , Praveen Raghavan , Francky Catthoor , Keishi Sakanushi , Yoshinori Takeuchi , Masaharu Imai, Systematic architecture exploration based on optimistic cycle estimation for low energy embedded processors, Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, January 19-22, 2009, Yokohama, Japan
|
|