|
ABSTRACT
The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apple Computer, Inc., "Optimizing with SHARK, Big Payoff, Small Effort," http://developer.apple.com/tools/shark_optimize.html.
|
 |
2
|
Dinesh C. Suresh , Walid A. Najjar , Frank Vahid , Jason R. Villarreal , Greg Stitt, Profiling tools for hardware/software partitioning of embedded applications, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
3
|
P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Chang, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, "MATCH: A MATLAB Compilation Environment for Configurable Computing Systems," International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, 2000.
|
| |
4
|
S. Gupta, N. Savoiu, N. D. Dutt, R. K. Gupta, A. Nicolau, "Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis," IEEE Transactions on Computer Aided Design, February, 2004.
|
| |
5
|
Alex Jones , Debabrata Bagchi , Sartajit Pal , Prith Banerjee , Alok Choudhary, PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations, Power aware computing, Kluwer Academic Publishers, Norwell, MA, 2002
|
| |
6
|
|
| |
7
|
Synopsys, Inc., "Behavioral Compiler," http://www.synopsys.com.
|
| |
8
|
V.A. Chouliaras and J. Nunez, "Scalar Coprocessors for Accelerating the G723.1 and G729A Speech Coders," IEEE Transactions on Consumer Electronics, Vol. 69 No. 3, August 2003, pp. 703--710.
|
| |
9
|
E. Atzori, S.M. Carta and L. Raffo, "44.6% Processing Cycles Reduction in GSM Voice by Low-power Reconfigurable Co-processor Architecture," Eletronics Letters, Vol. 38 No. 24, November 2002, pp. 1524--1526.
|
 |
10
|
Jörg Hilgenstock , Klaus Herrmann , Jan Otterstedt , Dirk Niggemeyer , Peter Pirsch, A video signal processor for MIMD multiprocessing, Proceedings of the 35th annual conference on Design automation, p.50-55, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277054]
|
| |
11
|
R. Garg, C.Y. Chung, D. Kim and Y. Kim, "Boundary Macroblock Padding in MPEG-4 Video Decoding Using a Graphics Co-processor," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12 No. 8, August 2002, pp. 719--723.
|
| |
12
|
C.N. Hinds, "An Enhanced Floating Point Coprocessor for Embedded Signal Processing and Graphics Applications," Conference Record of the 33rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 1999.
|
| |
13
|
|
| |
14
|
T. Bridges, S.W. Kitchel and R. M. Wehrmeister, "A CPU Utilization Limit for Massively Parallel MIMD Computers," Fourth Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, October 1992.
|
| |
15
|
Seth Copen Goldstein , Herman Schmit , Mihai Budiu , Srihari Cadambi , Matt Moe , R. Reed Taylor, PipeRench: A Reconfigurable Architecture and Compiler, Computer, v.33 n.4, p.70-77, April 2000
[doi> 10.1109/2.839324]
|
| |
16
|
|
| |
17
|
|
| |
18
|
E. Mirsky and A. DeHon," MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources", in Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, April 1996.
|
| |
19
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
S. Dutta, A. Wolfe, W. Wolf and K. O'Connor, "Design Issues for Very-Long-Instruction-Word VLSI Video Signal Processors," IEEE Workshop on VLSI Signal Processing, San Francisco, October 1996.
|
| |
24
|
R. Hoare, S. Tung, K. Werger, "A 64-Way SIMD Processing Architecture on an FPGA," in Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems, 2003, pp. 345--350.
|
| |
25
|
A. Jones, R. Hoare, I. Kourtev, J. Fazekas, D. Kusic, J. Foster, S. Boddie, A. Muaydh, "A 64-way VLIW/SIMD FPGA Processing Architecture and Design Flow," in Proc. of ICECS, 2004.
|
| |
26
|
Advanced RISC Machines, "ARM7TDMI Processor," http://www.arm.com/products/CPUs/ARM7TDMI.html.
|
| |
27
|
Altera Corporation, "NIOS II Soft-core Processor," http://www.altera.com/products/ip/processors/nios2/cores/ni2-processor_cores.html.
|
| |
28
|
Xilinx Corporation, "Microblaze Soft-core Processor," http://www.xilinx.com/ipcenter/processor_central/microblaze/performance.htm.
|
| |
29
|
International Business Machines (IBM), "Power-PC 405 Embedded CPU," http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_405_Embedded_Cores.
|
| |
30
|
|
| |
31
|
"Trimaran, An Infrastructure for Research in Instruction Level Parallelism", 1998. http://www.trimaran.org.
|
CITED BY 11
|
|
Alex K. Jones , Raymond R. Hoare , Swapna R. Dontharaju , Shenchih Tung , Ralph Sprang , Josh Fazekas , James T. Cain , Marlin H. Mickle, An automated, reconfigurable, low-power RFID tag, Proceedings of the 43rd annual conference on Design automation, July 24-28, 2006, San Francisco, CA, USA
|
|
|
Alex K. Jones , Raymond Hoare , Dara Kusic , Gayatri Mehta , Josh Fazekas , John Foster, Reducing power while increasing performance with supercisc, ACM Transactions on Embedded Computing Systems (TECS), v.5 n.3, p.658-686, August 2006
|
|
|
|
|
|
Alex K. Jones , Raymond Hoare , Swapna Dontharaju , Shenchih Tung , Ralph Sprang , Joshua Fazekas , James T. Cain , Marlin H. Mickle, An automated, FPGA-based reconfigurable, low-power RFID tag, Microprocessors & Microsystems, v.31 n.2, p.116-134, March, 2007
|
|
|
Swapna Dontharaju , Shenchih Tung , James T. Cain , Leonid Mats , Marlin H. Mickle , Alex K. Jones, A design automation and power estimation flow for RFID systems, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.14 n.1, p.1-31, January 2009
|
|
|
|
|
|
Alex K. Jones , Swapna Dontharaju , Shenchih Tung , Leo Mats , Peter J. Hawrylak , Raymond R. Hoare , James T. Cain , Marlin H. Mickle, Radio frequency identification prototyping, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.13 n.2, p.1-22, April 2008
|
|
|
Peter Yiannacouras , J. Gregory Steffan , Jonathan Rose, VESPA: portable, scalable, and flexible FPGA-based vector processors, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|
|
|
REVIEW
"Vassilios A. Chouliaras : Reviewer"
This is a very exciting piece of research in the general area of configurable, extensible processors and the software/hardware interface. The authors propose a hybrid architecture, consisting of a parameterized very long instruction word (VLIW) co
more...
|