ACM Home Page
Please provide us with feedback. Feedback
A low power front-end for embedded processors using a block-aware instruction set
Full text PdfPdf (364 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems table of contents
Salzburg, Austria
SESSION: Low power and thermal-aware architectures table of contents
Pages: 267 - 276  
Year of Publication: 2007
ISBN:978-1-59593-826-8
Authors
Ahmad Zmily  Stanford University
Christos Kozyrakis  Stanford University
Sponsors
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 90,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1289881.1289926
What is a DOI?

ABSTRACT

Energy, power, and area efficiency are critical design concerns for embedded processors. Much of the energy of a typical embedded processor is consumed in the front-end since instruction fetching happens on nearly every cycle and involves accesses to large memory arrays such as instruction and branch target caches. The use of small front-end arrays leads to significant power and area savings, but typically results in significant performance degradation. This paper evaluates and compares optimizations that improve the performance of embedded processors with small front-end caches. We examine both software techniques, such as instruction re-ordering and selective caching, and hardware techniques, such as instruction prefetching, tagless instruction cache, and unified caches for instruction and branch targets. We demonstrate that, building on top of a block-aware instruction set, these optimizations can eliminate the performance degradation due to small front-end caches. Moreover, selective combinations of these optimizations lead to an embedded processor that performs significantly better than the large cache design while maintaining the area and energy efficiency of the small cache design.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
D. Burger and T. M. Austin. Simplescalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.
 
6
7
8
 
9
Intel Corporation. Intel Itanium Architecture Software Developers Manual. Revision 2.0, December 2001.
 
10
Intel Corporation. Intel PXA27x Processor Family Developer's Manual, October 2004.
 
11
12
 
13
14
15
16
17
18
 
19
20
 
21
 
22
23
 
24
 
25
C. Rowen. Engineering the Complex SOC. Prentice Hall, 2004.
 
26
J. S. Seng and D. M. Tullsen. Architecture-Level Power Optimization-What Are the Limits? Journal of Instruction-Level Parallelism 7, 7(3):1--20, January 2005.
 
27
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An Integrated Cache Timing, Power, Area Model. Technical Report 2001/02, Compaq Western Research Laboratory, Aug. 2001.
 
28
 
29
30
 
31
A. Zmily, E. Killian, and C. Kozyrakis. Improving Instruction Delivery with a Block-Aware ISA. In The Proceedings of EuroPar Conference, pages 530--539, Lisbon, Portugal, August 2005.
32
33
 
34

Collaborative Colleagues:
Ahmad Zmily: colleagues
Christos Kozyrakis: colleagues