| A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer |
| Full text |
Pdf
(997 KB)
|
Source
|
International Symposium on Field Programmable Gate Arrays
archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays
table of contents
Monterey, California, USA
SESSION: High performance computing applications
table of contents
Pages 83-92
Year of Publication: 2009
ISBN:978-1-60558-410-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 33, Downloads (12 Months): 148, Citation Count: 0
|
|
|
ABSTRACT
Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Xuedong Huang , Alex Acero , Raj Reddy , Hsiao-Wuen Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR, Upper Saddle River, NJ, 2001
|
| |
2
|
|
| |
3
|
Stolzle, A. et al. "Integrated Circuits for a Real-Time Large-Vocabulary Continuous Speech Recognition System," IEEE Journal of Solid-State Circuits, vol. 26 no. 1, pp 2--11, Jan 1991.
|
| |
4
|
R. Kavaler et al., A Dynamic Time Warp Integrated Circuit for a 1000-Word Recognition System", IEEE Journal of Solid-State Circuits, vol SC-22, NO 1, February 1987, pp 3--14.
|
| |
5
|
Cali, L., Lertora, F., Besana, M., Borgatti, M., "Co-Design Method Enables Speech Recognition SoC", EETimes, Nov. 2001, p. 12.
|
 |
6
|
Binu Mathew , Al Davis , Zhen Fang, A low-power accelerator for the SPHINX 3 speech recognition system, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951739]
|
 |
7
|
Rajeev Krishna , Scott Mahlke , Todd Austin, Architectural optimizations for low-power, real-time speech recognition, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951740]
|
 |
8
|
Sergiu Nedevschi , Rabin K. Patra , Eric A. Brewer, Hardware speech recognition for user interfaces in low cost, low power devices, Proceedings of the 42nd annual conference on Design automation, June 13-17, 2005, Anaheim, California, USA
[doi> 10.1145/1065579.1065760]
|
| |
9
|
"The Talking Cure", The Economist, Mar 12 2005, p. 11.
|
 |
10
|
Edward C. Lin , Kai Yu , Rob A. Rutenbar , Tsuhan Chen, A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216928]
|
| |
11
|
|
| |
12
|
CMU Sphinx Open Source Speech Recognition Engines, http://cmusphinx.sourceforge.net/html/cmusphinx.php.
|
| |
13
|
Pallett, D., "A Look at NIST's Benchmark ASR Tests: Past, Present, and Future", Proc 2003 IEEE Workshop on Automatic Speech Recognition and Understanding.
|
| |
14
|
Hwang, M. et al., "Subphonetic Modeling with Markov States--Senone", International Conference on Acoustics, Speech and Signal Processing, p. 33--36, Mar. 1992
|
| |
15
|
|
| |
16
|
Viterbi, A.: "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Transactions on Information Theory, vol. 13, pp 260--269 1967.
|
|