|
ABSTRACT
High performance microprocessors are designed with general-purpose applications in mind. When it comes to embedded applications, these architectures typically perform control-intensive tasks in a System-on-Chip (SoC) design. But they are significantly inefficient for data-intensive tasks such as video encoding/decoding. Although configurable processors fill this gap by complementing the existing functional units with instruction extensions, their performance lags behind the needs of real-time embedded tasks. In this paper, we evaluate the performance potential of a dataflow processor for H.264 video decoding. We first profile the H.264 application to capture the amount of data traffic among modules. We use this information to guide the placement of H.264 modules in the WaveScalar dataflow architecture. A simulated annealing based placement algorithm produces the final placement aiming to optimize the communication costs between the modules in the dataflow architecture. In addition to outperforming contemporary embedded and customized processors, our simulated annealing guided design shows a speedup of 13% in execution time over the original WaveScalar architecture. With our dataflow design methodology, emerging embedded applications requiring several GOPS to meet real-time constraints can be drafted within a reasonable amount of design time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ITRS 2003-2018 Roadmap - System Functional Requirements For Handheld Wireless Low Power SoC
|
| |
2
|
Hartej Singh , Ming-Hau Lee , Guangming Lu , Nader Bagherzadeh , Fadi J. Kurdahi , Eliseu M. Chaves Filho, MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications, IEEE Transactions on Computers, v.49 n.5, p.465-481, May 2000
[doi> 10.1109/12.859540]
|
| |
3
|
|
| |
4
|
A. Hoffmann, A, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, H. Meyr, A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 20, Issue 11, pp. 1338--1354, 2004.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Nathan Clark , Manjunath Kudlur , Hyunchul Park , Scott Mahlke , Krisztian Flautner, Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.30-40, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.5]
|
 |
8
|
|
| |
9
|
Tensilica web page, http://www.tensilica.com/
|
| |
10
|
ARC website, http://www.arc.com
|
| |
11
|
|
| |
12
|
H.264 TML Model, http://bs.hhi.de/ suehring/tml/
|
| |
13
|
S. Saponara and C. Blanch, K. Denolf and J. Bormans, The JVT Advanced Video Coding Standard: Complexity And Performance Analysis On A Tool-by-tool Basis, ICIP Conference, 2002.
|
| |
14
|
J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer and T. Wedi, Video coding with H.264/AVC: tools, performance, and complexity, IEEE Circuits and Systems Magazine, Vol. 4, Issue 1, pp. 7--28, 2004.
|
| |
15
|
|
| |
16
|
N. Pazos, A. Maxiaguine, P. Ienne, and Y. Leblebici. Parallel modelling paradigm in multimedia applications: Mapping and scheduling onto a multi-processor system-on-chip platform. In Proceedings of the International Global Signal Processing Conference, Santa Clara, Calif., September 2004.
|
| |
17
|
CACTI web page, http://research.compaq.com/wrl/people/jouppi/CACTI.html
|
| |
18
|
D. Burger and T.M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report 1342, Computer Sciences Dept., University of Wisconsin-Madison, 1997.
|
 |
19
|
|
| |
20
|
CoWARE LisaTek Processor Designer Manual.
|
|