ACM Home Page
Please provide us with feedback. Feedback
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer
Full text PdfPdf (997 KB)
Source
International Symposium on Field Programmable Gate Arrays archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays table of contents
Monterey, California, USA
SESSION: High performance computing applications table of contents
Pages 83-92  
Year of Publication: 2009
ISBN:978-1-60558-410-2
Authors
Edward C. Lin  Carnegie Mellon University, Pittsburgh, PA, USA
Rob A. Rutenbar  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 148,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508128.1508141
What is a DOI?

ABSTRACT

Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Stolzle, A. et al. "Integrated Circuits for a Real-Time Large-Vocabulary Continuous Speech Recognition System," IEEE Journal of Solid-State Circuits, vol. 26 no. 1, pp 2--11, Jan 1991.
 
4
R. Kavaler et al., A Dynamic Time Warp Integrated Circuit for a 1000-Word Recognition System", IEEE Journal of Solid-State Circuits, vol SC-22, NO 1, February 1987, pp 3--14.
 
5
Cali, L., Lertora, F., Besana, M., Borgatti, M., "Co-Design Method Enables Speech Recognition SoC", EETimes, Nov. 2001, p. 12.
6
7
8
 
9
"The Talking Cure", The Economist, Mar 12 2005, p. 11.
10
 
11
 
12
CMU Sphinx Open Source Speech Recognition Engines, http://cmusphinx.sourceforge.net/html/cmusphinx.php.
 
13
Pallett, D., "A Look at NIST's Benchmark ASR Tests: Past, Present, and Future", Proc 2003 IEEE Workshop on Automatic Speech Recognition and Understanding.
 
14
Hwang, M. et al., "Subphonetic Modeling with Markov States--Senone", International Conference on Acoustics, Speech and Signal Processing, p. 33--36, Mar. 1992
 
15
 
16
Viterbi, A.: "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Transactions on Information Theory, vol. 13, pp 260--269 1967.

Collaborative Colleagues:
Edward C. Lin: colleagues
Rob A. Rutenbar: colleagues