|
ABSTRACT
We describe results of a case study whose intent was to determine whether new techniques for hardware/software partitioning of an application's binary are competitive with partitioning at the C source code level. While such competitiveness has been shown previously for standard benchmark suites involving smaller or unoptimized applications, the case study instead focuses on a complete 16,000-line highly-optimized commercial-grade application, namely an H.264 video decoder. The several month study revealed that binary partitioning was indeed competitive, achieving nearly identical 2.5x speedups as source level partitioning, compared to a standard microprocessor. Furthermore, the study revealed that several simple C-level coding modifications, including pass by value-return, function specialization, algorithmic specialization, hardware-targeted reimplementation, global array elimination, hoisting and sinking of error code, and conversion to explicit control flow, could lead to improved application speedups approaching 7x for both source level and binary level partitioning.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Scott Ananian. SiliconC: A Hardware Backend for SUIF. http://flex-compiler.lcs.mit.edu/SiliconC.
|
| |
2
|
W. Böhm , J. Hammes , B. Draper , M. Chawathe , C. Ross , R. Rinker , W. Najjar, Mapping a Single Assignment Programming Language to Reconfigurable Systems, The Journal of Supercomputing, v.21 n.2, p.117-130, February 2002
[doi> 10.1023/A:1013623303037]
|
| |
3
|
D. Burger and T.M. Austin. The SimpleScalar Tool Set, Version 2.0. University of Wisconsin-Madison Computer Sciences Department Technical Report #1342. June, 1997.
|
| |
4
|
CatapultC. http://www.mentor.com/products/c-based_design/
|
| |
5
|
C. Cifuentes, M. Van Emmerik, D.Ung, D. Simon, T. Waddington. Preliminary Experiences with the Use of the UQBT Binary Translation Framework. Proceedings of the Workshop on Binary Translation, Newport Beach, USA, October 1999.
|
| |
6
|
CriticalBlue. <http://www.criticalblue.com>.
|
| |
7
|
P. Eles, Z. Peng, K. Kuchchinski and A. Doboli. System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search. Kluwer's Design Automation for Embedded Systems, vol2, no 1, pp. 5-32, Jan 1997.
|
 |
8
|
|
| |
9
|
Freescale Semiconductor. http://www.freescale.com/.
|
 |
10
|
|
| |
11
|
|
| |
12
|
OXFORD Hardware Compilation Group, The Handel language, Technical Report, Oxford University 1997.
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
XPRES Compiler. http://www.tensilica.com/html/xpres.html.
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scott Sirowy , Yonghui Wu , Stefano Lonardi , Frank Vahid, Clock-frequency assignment for multiple clock domain systems-on-a-chip, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|