| Architectures and APIs: assessing requirements for delivering FPGA performance to applications |
| Full text |
Html
(2 KB),
Pdf
(281 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
table of contents
Tampa, Florida
SESSION: Technical papers
table of contents
Article No. 111
Year of Publication: 2006
ISBN:0-7695-2700-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 2
|
|
|
ABSTRACT
Reconfigurable computing leveraging field programmable gate arrays (FPGAs) is one of many accelerator technologies that are being investigated for application to high performance computing (HPC). Like most accelerators, FPGAs are very efficient at both dense matrix multiplication and FFT computations, but two important aspects of how to deliver that performance to applications have received too little attention. First, the standard API for important compute kernels hides parallelism from the system. Second, the issue of system architecture is virtually never addressed. This paper explores both issues and their implications for applications. We find that high bandwidth, low latency connectivity can be important, but the right API can be even more important.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Yong Dou , S. Vassiliadis , G. K. Kuzmanov , G. N. Gaydadjiev, 64-bit floating-point FPGA matrix multiplication, Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, February 20-22, 2005, Monterey, California, USA
[doi> 10.1145/1046192.1046204]
|
| |
4
|
Frigo, M., and Johnson, S. G. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, vol. 3, 1381--1384.
|
| |
5
|
Govindu, G., Zhuo, L., Choi, S., Gundala, P., and Prasanna, V. K. 2003. Area and power performance analysis of a floating-point based application on FPGAs. In Proceedings of the Seventh Annual Workshop on High Performance Embedded Computing (HPEC 2003).
|
| |
6
|
Govindu, G., Choi, S., Prasanna, V. K., Daga, V., Gangadharpalli, S., and Sridhar, V. 2004. A high-performance and energy-efficient architecture for floating-point based lu decomposition on fpgas. In Proceedings of the 11th Reconfigurable Architectures Workshop (RAW).
|
| |
7
|
Govindu, G., Zhuo, L., Choi, S., Gundala, P., and Prasanna, V. K. 2004. Analysis of high-performance floating-point arithmetic on FPGAs. In Proceedings of the 11th Reconfigurable Architectures Workshop (RAW).
|
| |
8
|
|
| |
9
|
Janssen, C. Personal communications.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Plimpton, S. J., Pollock, R., and Stevens, M. 1997. Particle-mesh ewald and rRESPA for parallel molecular dynamics. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
Williamson, D. L., Drake, J. B., Hack, J. J., Jakob, R., and Swarztrauber, P. N. 1992. A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys. 102, 211--224.
|
| |
19
|
Zhuo, L., and Prasanna, V. K. 2004. Scalable and modular algorithms for floating-point matrix multiplication on fpgas. In 18th International Parallel and Distributed Processing Symposium (IPDPS'04).
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
|