|
ABSTRACT
Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To improve FPGA performance for these applications, this article introduces the Field Programmable Compressor Tree (FPCT) as an alternative to the DSP blocks. By providing just a compressor tree, the FPCT can perform multi-input addition along with parallel multiplication and MAC in conjunction with a small amount of FPGA general logic. Furthermore, the user can configure the FPCT to precisely match the bitwidths of the operands being summed. Although an FPCT cannot beat the performance of a well-designed ASIC compressor tree of fixed bitwidth, for example, 9×9 and 18×18-bit multipliers/MACs in DSP blocks, its configurable bitwidth and ability to perform multi-input addition is ideal for reconfigurable devices that are used across a variety of applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
| |
2
|
Altera Corporation. 2006. Stratix II performance and logic efficiency analysis. White paper. September. http://www.altera.com/.
|
| |
3
|
Altera Corporation. 2008a. Stratix II device handbook. http://www.altera.com/.
|
| |
4
|
Altera Corporation. 2008b. Stratix III device handbook. http://www.altera.com/.
|
| |
5
|
Altera Corporation. 2008c. Stratix IV device handbook. http://www.altera.com/.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
Alessandro Cevrero , Panagiotis Athanasopoulos , Hadi Parandeh-Afshar , Ajay K. Verma , Philip Brisk , Frank K. Gurkaynak , Yusuf Leblebici , Paolo Ienne, Architectural improvements for field programmable counter arrays: enabling efficient synthesis of fast compressor trees on FPGAs, Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, February 24-26, 2008, Monterey, California, USA
[doi> 10.1145/1344671.1344699]
|
| |
11
|
Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst.-I 53, 578--593.
|
| |
12
|
Cherepacha, D. and Lewis, D. 1996. DP-FPGA: An FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.
|
| |
13
|
Cosoroaba, A. and Rivoallon, F. 2006. Achieving higher system performance with the Virtex-5 family of FPGAs. White paper: Xilinx Corporation. July. http://www.xilinx.com/.
|
| |
14
|
Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.
|
| |
15
|
Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.
|
| |
19
|
Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Des. 26, 203--215.
|
| |
20
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
21
|
|
| |
22
|
Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2009. Scalable and low cost design approach for variable block size motion estimation. In Proceedings of the International Symposium on VLSI Design Automation and Test.
|
| |
28
|
|
| |
29
|
Sriram, S., Brown, K., Defosseux, R., Moerman, F., Paviot, O., Sundararajan, V., and Gatherer, A. 2005. A 64 channel programmable receiver chip for 3G wireless infrastructure. In Proceedings of the IEEE Custom Integrated Circuits Conference. 59--62.
|
| |
30
|
|
| |
31
|
|
| |
32
|
Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-Flow transformations to maximize the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aided Des. 27, 1761--1774.
|
| |
33
|
Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.
|
| |
34
|
Xilinx Corporation. 2008a. Virtex-5 FPGA XtremeDSP design considerations. http://www.xilinx.com/.
|
| |
35
|
Xilinx Corporation. 2008b. Virtex-5 user guide. http://www.xilinx.com/.
|
 |
36
|
Paul S. Zuchowski , Christopher B. Reynolds , Richard J. Grupp , Shelly G. Davis , Brendan Cremen , Bill Troxel, A hybrid ASIC and FPGA architecture, Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, p.187-194, November 10-14, 2002, San Jose, California
[doi> 10.1145/774572.774600]
|
|