ACM Home Page
Please provide us with feedback. Feedback
Computation reuse in domain-specific optimization of signal recognition
Source
International Symposium on Field Programmable Gate Arrays archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays table of contents
Monterey, California, USA
POSTER SESSION: Applications table of contents
Pages 281-281  
Year of Publication: 2009
ISBN:978-1-60558-410-2
Authors
Melina Demertzi  University of Southern California, Los Angeles, CA, USA
Pedro C. Diniz  IST/UTL/INESC-ID, Porto Salvo, Portugal
Mary W. Hall  University of Utah, Salt Lake City, UT, USA
Anna C. Gilbert  University of Michigan, Ann Arbor, MI, USA
Yi Wang  University of Michigan, Ann Arbor, MI, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508128.1508190
What is a DOI?

ABSTRACT

Domain-specific optimizations that exploit specific arithmetic and representation formats have been shown to achieve significant performance/area gains in FPGA hardware designs. In this work, we describe an approach to domain-specific optimization that goes beyond this representation level. We perform a joint optimization from a high-level mathematical abstract representation and hardware implementation point of view. We focus on a signal recognition system that distinguishes between spoken digits. We construct transform matrices from Walsh wavelet packets in conjunction with a BestBasis algorithm. The resulting transform matrices exhibit a rich algebraic structure and contain significant overlap across rows, exhibiting significant computation reuse in the dot-product operation of the transform matrix applied to the signal vector. We have developed an algorithm for identifying the computation reuse and scheduling the row computations across various computation units to significantly reduce the overall amount of computation.

We have implemented a custom-built dot-product multiplication unit targeting a Virtex-II-Pro FPGA device that exploits computation reuse. A baseline dot-product multiplication unit, without reuse, exhibits a maximum clock rate of 199.3 MHz while utilizing only 2% of the device capacity. The optimized system that exploits reuse also includes a computation scheduler and attains a respectable clock rate of 196 MHz while using 8,183 (57%) slices of the FPGA device. The FPGA hardware implementation reduces the amount of computation for an individual matrix by as much as 6.35× and an average of 2× for a single pipelined dot-product unit over the baseline implementation. Although it is larger in area than the baseline, the implementation that exploits reuse even achieves a 2× computation reduction when compared to 3 concurrently-executing simpler accumulation units with the same aggregate FPGA design area.

While the results in this paper reflect the opportunities of a specific signal processing problem, this work highlights the concept of exploiting computation reuse derived from a higher-level abstract representation at a mathematical and hardware level. As such, we believe this approach can also be leveraged in other signal recognition problems with specific well-characterized computational structures and signal dictionaries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Coifman and M. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE Trans. on Information Theory, 38(2):713--718, 1992.
 
2
3
 
4
5
6
 
7
O. C. D. G. A. N. M. Peardon. High performance scientific computing using FPGAs with IEEE floating point and logarithmic arithmetic for lattice QCD. In Proc. of the Intl. Conf. on Field Programmable Logic and Applications (FPL'06), pages 1--6, August 2006.
 
8
M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proc. of the IEEE special issue on Program Generation, Optimization, and Adaptation, 93(2):232--275, 2005.
 
9
N. Saito and R. Coifman. Local discriminant bases. Mathematical Imaging: Wavelet Applications in Signal and Image Processing, Proc. SPIE, 2303, July 1994.
10

Collaborative Colleagues:
Melina Demertzi: colleagues
Pedro C. Diniz: colleagues
Mary W. Hall: colleagues
Anna C. Gilbert: colleagues
Yi Wang: colleagues