|
ABSTRACT
This paper presents novel techniques to integrate the use of Single Instruction Multiple Data (SIMD) functional units in a high-level synthesis (HLS) design methodology. SIMD functional units can be configured to operate in one or more SIMD modes, in which they process multiple sets of smaller bitwidth operands in parallel. Conceptually, the use of SIMD functional units en-ables HLS to (i) exploit parallelism to a higher degree without using additional resources, (ii) improve resource utilization by enabling hardware re-use at a fine-grained level, and (iii) improve energy efficiency for a given area and/or performance constraint.We illustrate the issues involved in performing high-level syn-thesis with SIMD functional units, and discuss how algorithms involved in a typical high-level synthesis flow can be enhanced to result in maximal performance and energy improvements. These techniques are not restricted to specific high-level synthesis tools/algorithms, and can be plugged into any generic high-level synthesis system. Experimental results indicate that, the use of SIMD units can improve performance by up to 1.9X (average of 1.57X), and simultaneously reduce energy consumption by up to 33.16% (average of 28.03%) compared to well-optimized conven-tional designs, with minimal area overheads (average of 2.18%). The performance improvements can be translated into additional energy savings, resulting in upto 66.26% (average of 55.88%) en-ergy reductions. Further, our experiments demonstrate that, the use of SIMD units in a HLS tool results in a shift in the entire area-delay- energy tradeoff envelope that can be obtained, to include de-sirable parts of the design space (i.e., higher quality designs) that were hitherto unreachable.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
[4] "Closing the Gap Between ASIC and Custom: Design Examples", special session #27 at the IEEE/ACM Design Automation Conf., June 2001.
|
| |
5
|
[5] N. Takagi, "A multiple precision modular multiplication algorithm with triangle additions" in IEICE Trans. on Info. and Systems, vol. E78, 1995.
|
| |
6
|
|
| |
7
|
Chang-Guo Zhou , Ihtisham Kabir , Leslie Kohn , Aman Jabbi , D. Rice , Xio-Ping Hu, MPEG video decoding with the UltraSPARC visual instruction set, Proceedings of the 40th IEEE Computer Society International Conference, p.470, March 05-09, 1995
|
| |
8
|
[8] J. F. Blinn, "Fugue for MMX" in IEEE Computer Graphics and Applications , vol. 17, no. 2, pp. 88-93, 1997.
|
| |
9
|
[9] Z. J. A. Mou, D. S. Rice, and D. Wei, "VIS based native video processing on UltraSPARC-II" in Proc. Intl. Conf. Image Proc., pp. 153-156, 1996.
|
| |
10
|
|
| |
11
|
|
| |
12
|
[12] M. D. Ercegovac, D. Kirovski, G. Mustafa, and M. Potkonjak, "Behavioral synthesis optimization using multiple precision arithmetic" in Proc. ICASSP, pp. 3113-3116, 1998.
|
 |
13
|
Milos Ercegovac , Darko Kirovski , Miodrag Potkonjak, Low-power behavioral synthesis optimization using multiple precision arithmetic, Proceedings of the 36th ACM/IEEE conference on Design automation, p.568-573, June 21-25, 1999, New Orleans, Louisiana, United States
[doi> 10.1145/309847.310000]
|
| |
14
|
|
| |
15
|
[15] "Mosis", (http://www.mosis.org/Technical/Processes/menu-processes.html).
|
| |
16
|
[16] "DC Users Manual", Synopsys Inc., (http://www.synopsys.com).
|
 |
17
|
Subhrajit Bhattacharya , Sujit Dey , Franc Brglez, Performance analysis and optimization of schedules for conditional and loop-intensive specifications, Proceedings of the 31st annual conference on Design automation, p.491-496, June 06-10, 1994, San Diego, California, United States
[doi> 10.1145/196244.196477]
|
|