|
ABSTRACT
In this paper, we propose an approach for designing high-performance energy-efficient processing elements (PEs) using statically-scheduled nanocode-based architectures. Our approach is based on bottom-up refinement/trimming techniques that optimize a given datapath irrespective of whether it was designed manually or generated automatically. The optimizations can also preserve parts of the netlist specified by the designers, and hence, allow reuse of design efforts and can lead to predictable convergence. In this paper, we show that trimming unused and underutilized resources of typical general-purpose datapaths can lead to 30-40% average energy savings, without any performance loss. However, general-purpose architectures often compromise parallelism to make the design implementable. With our trimming approach, we can afford to have a base architecture that is not intended for implementation and has more parallelism, and then apply refinement to make it implementable. For our benchmarks, we achieved up to 1.8 times (avg. 25%) and 2.6 times (avg. 40%) performance improvement, compared to two general-purpose architectures (i.e. a 4-issue VLIW and a DLX), respectively. Additionally, the energy consumption is reduced by up to 5 times (avg. 2 times) compared to the trimmed general-purpose architectures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Celoxica: http://www.celoxica.com
|
| |
3
|
Catapult C: http://www.mentor.com
|
| |
4
|
Forte Design: http://www.forteds.com/
|
| |
5
|
Synfora: http://www.synfora.com/
|
| |
6
|
|
 |
7
|
Stefan Pees , Andreas Hoffmann , Vojin Zivojnovic , Heinrich Meyr, LISA—machine description language for cycle-accurate models of programmable DSP architectures, Proceedings of the 36th ACM/IEEE conference on Design automation, p.933-938, June 21-25, 1999, New Orleans, Louisiana, United States
[doi> 10.1145/309847.310101]
|
| |
8
|
|
| |
9
|
F. Brewer, D. Gajski, "Chippe: A system for constraint driven behavioral synthesis", IEEE TCAD, 1990.
|
| |
10
|
Y. Hara, et. al., "Function Call Optimization for Efficient Behavioral Synthesis", IEICE Transaction, Vol. E90-A, No.9, 2007.
|
| |
11
|
|
| |
12
|
A. Agrawala, T. Rauscher, "Foundations of Microprogramming: Architecture, Software, and Applications", Academic Press, ISBN: 0120451506, 1976.
|
| |
13
|
S. Bashford, U. Bieker, B. Harking, R. Leupers, P. Marwedel, A. Neumann, D. Voggenauer, "The MIMOLA Language - Version 4.1", Technical Report, Computer Sci. Dpt., University of Dortmund, 1994.
|
 |
14
|
|
 |
15
|
|
| |
16
|
J. Trajkovic, D. Gajski, "Automatic Data Path Generation from C Code for Custom Processors", Intl. Embedded Systems Symposium, 2007.
|
 |
17
|
|
| |
18
|
S. Gupta, R. Gupta, N. Dutt, A. Nicolau, Spark: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, Springer, 2004, ISBN: 978-1-4020-7837-8.
|
 |
19
|
|
| |
20
|
NISC Technology website: http://www.cecs.uci.edu/~nisc/
|
 |
21
|
|
| |
22
|
|
| |
23
|
B. Gorjiara, M. Reshadi, D. Gajski, "Generic Architecture Description for Retargetable Compilation and Synthesis of Application-Specific Pipelined IPs", in Intl. Conference on Computer Design (ICCD), 2006.
|
| |
24
|
|
| |
25
|
MiBench: http://www.eecs.umich.edu/mibench/
|
| |
26
|
J. M. Rabaey, A. Chandrakasan, and B. Nikoli'c, "Digital Integrated Circuits", Printice Hall, 2003.
|
INDEX TERMS
Primary Classification:
B.
Hardware
B.5
REGISTER-TRANSFER-LEVEL IMPLEMENTATION
B.5.2
Design Aids
General Terms:
Algorithms,
Design,
Experimentation,
Performance
Keywords:
ASIP,
GNR,
datapath,
high-level synthesis,
nanocoded architectures,
netlist,
no-instruction-set computer (NISC),
power,
refinement
|