ACM Home Page
Please provide us with feedback. Feedback
A loop accelerator for low power embedded VLIW processors
Full text PdfPdf (221 KB)
Source
International Conference on Hardware Software Codesign archive
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis table of contents
Stockholm, Sweden
SESSION: New design techniques for application specific processors table of contents
Pages: 6 - 11  
Year of Publication: 2004
ISBN:1-58113- 937-3
Authors
Binu Mathew  University of Utah, Salt Late City, UT
Al Davis  University of Utah, Salt Late City, UT
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
SIGBED: ACM Special Interest Group on Embedded Systems
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 34,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1016720.1016726
What is a DOI?

ABSTRACT

The high transistor density afforded by modern VLSI processes have enabled the design of embedded processors that use clustered execution units to deliver high levels of performance. However, delivering data to the execution resources in a timely manner remains a major problem that limits ILP. It is particularly significant for embedded systems where memory and power budgets are limited. A distributed address generation and loop acceleration architecture for VLIW processors is presented. This decentralized on-chip memory architecture uses multiple SRAMs to provide high intra-processor bandwidth. Each SRAM has an associated stream address generator capable of implementing a variety of addressing modes in conjunction with a shared loop accelerator.The architecture is extremely useful for generating application specific embedded processors, particularly for processing input data which is organized as a stream. The idea is evaluated in the context of a fine grain VLIW architecture executing complex perception algorithms such as speech and visual feature recognition. Transistor level Spice simulations are used to demonstrate a 159x improvement in the energy delay product when compared to conventional architectures executing the same applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad memory : A design alternative for cache on-chip memory in embedded systems, 2002.
 
2
 
3
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1277--1284, September 1996.
 
4
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.
 
5
6
 
7
 
8
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2001.
 
9

CITED BY  7