ACM Home Page
Please provide us with feedback. Feedback
An FPGA-based VLIW processor with custom hardware execution
Full text PdfPdf (221 KB)
Source International Symposium on Field Programmable Gate Arrays archive
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays table of contents
Monterey, California, USA
SESSION: Computation techniques for FPGAs table of contents
Pages: 107 - 117  
Year of Publication: 2005
ISBN:1-59593-029-9
Authors
Alex K. Jones  University of Pittsburgh, Pittsburgh, PA
Raymond Hoare  University of Pittsburgh, Pittsburgh, PA
Dara Kusic  University of Pittsburgh, Pittsburgh, PA
Joshua Fazekas  University of Pittsburgh, Pittsburgh, PA
John Foster  University of Pittsburgh, Pittsburgh, PA
Sponsors
ACM: Association for Computing Machinery
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 111,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1046192.1046207
What is a DOI?

ABSTRACT

The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Apple Computer, Inc., "Optimizing with SHARK, Big Payoff, Small Effort," http://developer.apple.com/tools/shark_optimize.html.
2
 
3
P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Chang, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, "MATCH: A MATLAB Compilation Environment for Configurable Computing Systems," International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, 2000.
 
4
S. Gupta, N. Savoiu, N. D. Dutt, R. K. Gupta, A. Nicolau, "Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis," IEEE Transactions on Computer Aided Design, February, 2004.
 
5
 
6
 
7
Synopsys, Inc., "Behavioral Compiler," http://www.synopsys.com.
 
8
V.A. Chouliaras and J. Nunez, "Scalar Coprocessors for Accelerating the G723.1 and G729A Speech Coders," IEEE Transactions on Consumer Electronics, Vol. 69 No. 3, August 2003, pp. 703--710.
 
9
E. Atzori, S.M. Carta and L. Raffo, "44.6% Processing Cycles Reduction in GSM Voice by Low-power Reconfigurable Co-processor Architecture," Eletronics Letters, Vol. 38 No. 24, November 2002, pp. 1524--1526.
10
 
11
R. Garg, C.Y. Chung, D. Kim and Y. Kim, "Boundary Macroblock Padding in MPEG-4 Video Decoding Using a Graphics Co-processor," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12 No. 8, August 2002, pp. 719--723.
 
12
C.N. Hinds, "An Enhanced Floating Point Coprocessor for Embedded Signal Processing and Graphics Applications," Conference Record of the 33rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 1999.
 
13
 
14
T. Bridges, S.W. Kitchel and R. M. Wehrmeister, "A CPU Utilization Limit for Massively Parallel MIMD Computers," Fourth Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.
 
15
 
16
 
17
 
18
E. Mirsky and A. DeHon," MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources", in Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, April 1996.
 
19
 
20
 
21
 
22
 
23
S. Dutta, A. Wolfe, W. Wolf and K. O'Connor, "Design Issues for Very-Long-Instruction-Word VLSI Video Signal Processors," IEEE Workshop on VLSI Signal Processing, San Francisco, October 1996.
 
24
R. Hoare, S. Tung, K. Werger, "A 64-Way SIMD Processing Architecture on an FPGA," in Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems, 2003, pp. 345--350.
 
25
A. Jones, R. Hoare, I. Kourtev, J. Fazekas, D. Kusic, J. Foster, S. Boddie, A. Muaydh, "A 64-way VLIW/SIMD FPGA Processing Architecture and Design Flow," in Proc. of ICECS, 2004.
 
26
Advanced RISC Machines, "ARM7TDMI Processor," http://www.arm.com/products/CPUs/ARM7TDMI.html.
 
27
Altera Corporation, "NIOS II Soft-core Processor," http://www.altera.com/products/ip/processors/nios2/cores/ni2-processor_cores.html.
 
28
Xilinx Corporation, "Microblaze Soft-core Processor," http://www.xilinx.com/ipcenter/processor_central/microblaze/performance.htm.
 
29
International Business Machines (IBM), "Power-PC 405 Embedded CPU," http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_405_Embedded_Cores.
 
30
 
31
"Trimaran, An Infrastructure for Research in Instruction Level Parallelism", 1998. http://www.trimaran.org.

CITED BY  11


REVIEW

"Vassilios A. Chouliaras : Reviewer"

This is a very exciting piece of research in the general area of configurable, extensible processors and the software/hardware interface. The authors propose a hybrid architecture, consisting of a parameterized very long instruction word (VLIW) co  more...

Collaborative Colleagues:
Alex K. Jones: colleagues
Raymond Hoare: colleagues
Dara Kusic: colleagues
Joshua Fazekas: colleagues
John Foster: colleagues