ACM Home Page
Please provide us with feedback. Feedback
Efficient FPGA implementation of qr decomposition using a systolic array architecture
Source
International Symposium on Field Programmable Gate Arrays archive
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays table of contents
Monterey, California, USA
POSTER SESSION: Poster session 2: computing with reconfigurable technology table of contents
Pages 260-260  
Year of Publication: 2008
ISBN:978-1-59593-934-0
Authors
Xiaojun Wang  Airvana, Chelmsford, MA
Miriam Leeser  Northeastern University, Boston, MA
Sponsors
ACM: Association for Computing Machinery
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1344671.1344718
What is a DOI?

ABSTRACT

QR decomposition is used in many signal processing applications. We have implemented a systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. It uses a truly two dimensional systolic array architecture so latency scales well for large matrices. To accommodate the dynamic range of input data, floating-point arithmetic is chosen, using the Northeastern University Variable Precision Floating-Point (VFloat) library. We support any general floating-point format including IEEE single precision. Our design uses straightforward floating-point divide and square root implementations, compared to prior work which uses special operations or formats such as CORDIC or the logarithmic number system (LNS). This makes our design more standard and portable to different systems, thus easier to fit into a larger design. We support square, tall and short matrices. The input matrix size can be configured at compile-time to virtually any size. Therefore, it can be easily scaled to future larger FPGA devices, or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130 MHz for IEEE single precision floating-point format. 35 GFlops throughput peak performance is achieved for a 12 by 12 matrix with this implementation


Collaborative Colleagues:
Xiaojun Wang: colleagues
Miriam Leeser: colleagues