ACM Home Page
Please provide us with feedback. Feedback
32-bit floating-point FPGA gaussian elimination
Source
International Symposium on Field Programmable Gate Arrays archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays table of contents
Monterey, California, USA
POSTER SESSION: Applications table of contents
Pages 283-284  
Year of Publication: 2009
ISBN:978-1-60558-410-2
Authors
Bowei Zhang  Harbin Engineering University, Harbin, China
Guochang Gu  Harbin Engineering University, Harbin, China
Lin Sun  Harbin Engineering University, Harbin, China
Yanxia Wu  Harbin Engineering University, Harbin, China
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508128.1508196
What is a DOI?

ABSTRACT

The well-known Gaussian elimination (with partial pivoting) is a widely-used algorithm, one of traditional methods for solving dense linear systems of equations (LSEs). This paper presents a hardware-optimized variant of Gaussian elimination and its 32-bit ANSI/IEEE Std 754-1985 floating-point implementation on a Xilinx Virtex-5 FPGA with highly efficient design. The logic of the traditional algorithm is changed in order to make use of parallelism in hardware. According to this change the proposed hardware architecture can accomplish the solution very fast. Its average running time for n×n 32-bit floating-point matrices with uniformly distributed entries equals around n2(clock cycles) as opposed to n3 in software. Meanwhile, an open source library FPLibrary, which provides parameterizable pipelined floating-point operators, is used in the design. In realization, the design is finally integrated in an developed prototype system to accelerate the general purpose processor's work with the data exchanging through PCI-express between host and FPGA with DMA access method. Furthermore, by means of Strasson's algorithm, large LSEs also can be solved based on multiple FPGAs' co-work. The whole implementation placed and routed in the xc5vlx110t-3 FPGA with the applicability for solving LSE at most dimension 22, can be clocked with a frequency of up to 200MHz and computes the solution in 5.39 ¼s on average, providing a speed-up of up to almost 15 times over an equivalent software implementation on a Pentium IV 2.6GHz CPU. To the best of authors' knowledge, there has been no previous work on floating-point LSEs solving hardware and its implementation used as an application function unit in reconfigurable computing system.


Collaborative Colleagues:
Bowei Zhang: colleagues
Guochang Gu: colleagues
Lin Sun: colleagues
Yanxia Wu: colleagues