| A high-performance FPGA architecture for restricted boltzmann machines |
| Full text |
Pdf
(590 KB)
|
Source
|
International Symposium on Field Programmable Gate Arrays
archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays
table of contents
Monterey, California, USA
SESSION: High performance computing applications
table of contents
Pages: 73-82
Year of Publication: 2009
ISBN:978-1-60558-410-2
|
|
Authors
|
|
Daniel L. Ly
|
University of Toronto, Toronto, ON, Canada
|
|
Paul Chow
|
University of Toronto, Toronto, ON, Canada
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 136, Citation Count: 0
|
|
|
ABSTRACT
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Algorithms to implement a neural network in software are typically O(n2) problems -- as a result, neural networks are unable to provide the performance and scalability required in non-academic settings. In this paper, we investigate how FPGAs can be used to take advantage of the inherent parallelism in neural networks to provide a better implementation in terms of scalability and performance. We will focus on the Restricted Boltzmann machine, a popular type of neural network, because its architecture is particularly well-suited to hardware designs. The proposed, multi-purpose hardware framework is designed to reduce the O(n22) problem into an O(n) implementation while only requiring O(n) resources. The framework is tested on a Xilinx Virtex II-Pro XC2VP70 FPGA running at 100MHz. The resources support a Restricted Boltzmann machine of 128x128 nodes, which results in a computational speed of 1.02 billion connection-updates-per-second and a speed-up of 35 fold over an optimized C program running on a 2.8GHz Intel processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504--507, July 2006.
|
| |
3
|
C. S. Lindsey and T. Lindblad, "Survey of neural network hardware," Applications and Science of Artificial Neural Networks, pp. 1194--1205, 1995.
|
| |
4
|
Y. Liao, "Neural Networks in Hardware: A Survey," tech. rep., Santa Cruz, CA, USA, 2001.
|
| |
5
|
J. Zhu and P. Sutton, "FPGA Implementations of Neural Networks -- A Survey of a Decade of Progress," Lecture Notes in Computer Science, no. 2778, pp. 1062--1066, 2003.
|
| |
6
|
|
| |
7
|
D. Shen, L. Jin, and X. Ma, "FPGA Implementation of Feature Extraction and Neural Network Classifier for Handwritten Digit Recognition," Lecture notes in computer science, vol. 3173, pp. 988--995, 2004.
|
| |
8
|
|
| |
9
|
Y. Freund and D. Haussler, "Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks," NIPS, pp. 912--919, 1992.
|
| |
10
|
D. Geman and S. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721--741, 1984.
|
| |
11
|
D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A Learning Algorithm for Boltzmann Machines," Cognitive Science, vol. 9, pp. 147--169, 1985.
|
| |
12
|
|
| |
13
|
M. Saldana and P. Chow, "TMD-MPI: An MPI Implementation for Multiple Processors across Multiple FPGAs," IEEE International Conference on Field-Programmable Logic and Applications (FPL 2006), pp. 329--334, 2006.
|
| |
14
|
M. A. Carreira-Perpinan and G. E. Hinton, "On Contrastive Divergence Learning," Artificial Intelligence and Statistics, 2005.
|
| |
15
|
|
|