ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
A high-performance FPGA architecture for restricted boltzmann machines
Full text PdfPdf (590 KB)
Source
International Symposium on Field Programmable Gate Arrays archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays table of contents
Monterey, California, USA
SESSION: High performance computing applications table of contents
Pages: 73-82  
Year of Publication: 2009
ISBN:978-1-60558-410-2
Authors
Daniel L. Ly  University of Toronto, Toronto, ON, Canada
Paul Chow  University of Toronto, Toronto, ON, Canada
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 136,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508128.1508140
What is a DOI?

ABSTRACT

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Algorithms to implement a neural network in software are typically O(n2) problems -- as a result, neural networks are unable to provide the performance and scalability required in non-academic settings.

In this paper, we investigate how FPGAs can be used to take advantage of the inherent parallelism in neural networks to provide a better implementation in terms of scalability and performance. We will focus on the Restricted Boltzmann machine, a popular type of neural network, because its architecture is particularly well-suited to hardware designs. The proposed, multi-purpose hardware framework is designed to reduce the O(n22) problem into an O(n) implementation while only requiring O(n) resources. The framework is tested on a Xilinx Virtex II-Pro XC2VP70 FPGA running at 100MHz. The resources support a Restricted Boltzmann machine of 128x128 nodes, which results in a computational speed of 1.02 billion connection-updates-per-second and a speed-up of 35 fold over an optimized C program running on a 2.8GHz Intel processor.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504--507, July 2006.
 
3
C. S. Lindsey and T. Lindblad, "Survey of neural network hardware," Applications and Science of Artificial Neural Networks, pp. 1194--1205, 1995.
 
4
Y. Liao, "Neural Networks in Hardware: A Survey," tech. rep., Santa Cruz, CA, USA, 2001.
 
5
J. Zhu and P. Sutton, "FPGA Implementations of Neural Networks -- A Survey of a Decade of Progress," Lecture Notes in Computer Science, no. 2778, pp. 1062--1066, 2003.
 
6
 
7
D. Shen, L. Jin, and X. Ma, "FPGA Implementation of Feature Extraction and Neural Network Classifier for Handwritten Digit Recognition," Lecture notes in computer science, vol. 3173, pp. 988--995, 2004.
 
8
 
9
Y. Freund and D. Haussler, "Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks," NIPS, pp. 912--919, 1992.
 
10
D. Geman and S. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721--741, 1984.
 
11
D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A Learning Algorithm for Boltzmann Machines," Cognitive Science, vol. 9, pp. 147--169, 1985.
 
12
 
13
M. Saldana and P. Chow, "TMD-MPI: An MPI Implementation for Multiple Processors across Multiple FPGAs," IEEE International Conference on Field-Programmable Logic and Applications (FPL 2006), pp. 329--334, 2006.
 
14
M. A. Carreira-Perpinan and G. E. Hinton, "On Contrastive Divergence Learning," Artificial Intelligence and Statistics, 2005.
 
15