ACM Home Page
Please provide us with feedback. Feedback
A low-power accelerator for the SPHINX 3 speech recognition system
Full text PdfPdf (243 KB)
Source International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems table of contents
San Jose, California, USA
SESSION: Embedded applications table of contents
Pages: 210 - 219  
Year of Publication: 2003
ISBN:1-58113-676-5
Authors
Binu Mathew  University of Utah, Salt Late City, UT
Al Davis  University of Utah, Salt Late City, UT
Zhen Fang  University of Utah, Salt Late City, UT
Sponsors
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 71,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/951710.951739
What is a DOI?

ABSTRACT

Accurate real-time speech recognition is not currently possible in the mobile embedded space where the need for natural voice interfaces is clearly important. The continuous nature of speech recognition coupled with an inherently large working set creates significant cache interference with other processes. Hence real-time recognition is problematic even on high-performance general-purpose platforms. This paper provides a detailed analysis of CMU's latest speech recognizer (Sphinx 3.2), identifies three distinct processing phases, and quantifies the architectural requirements for each phase. Several optimizations are then described which expose parallelism and drastically reduce the bandwidth and power requirements for real-time recognition. A special-purpose accelerator for the dominant Gaussiann probability phase is developed for a 0.25μ CMOS process which is then analyzed and compared with Sphinx's measured energy and performance on a 0.13μ 2.4 GHz Pentium 4 system. The results show an improvement in power consumption by a factor of 29 at equivalent processing throughput. However after normalizing for process, the special-purpose approach has twice the throughput, and consumes 104 times less energy than the general-purpose processor. The energy-delay product is a better comparison metric due to the inherent design trade-offs between energy consumption and performance. The energy-delay product of the special-purpose approach is 196 times better than the Pentium 4. These results provide strong evidence that real-time large vocabulary speech recognition can be done within a power budget commensurate with embedded processing using today's technology.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Abnous, K. Seno, Y. Ichikawa, M. Wan, and J. M. Rabaey. Evaluation of a low-power reconfigurable DSP architecture. In IPPS/SPDP Workshops, pages 55--60, 1998.
 
2
K. Agaram, S. W. Keckler, and D. Burger. A characterization of speech recognition on modern computer systems. In Proceedings of the 4th IEEE Workshop on Workload Characterization, Dec. 2001.
3
 
4
 
5
J. G. F. David~Pallett and M. A. Przybocki. 1996 preliminary broadcast news benchmark tests. In Proceedings of the 1997 DARPA Speech Recognition Workshop, Feb. 1997.
 
6
T. Hain, P. Woodland, G. Evermann, and D. Povey. The cu-htk march 2000 hub5e transcription system. 2000.
 
7
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.
 
8
C. Lai, S.-L. Lu, and Q. Zhao. Performance analysis of speech recognition software. In Proceedings of the Fifth Workshop on Computer Architecture Evaluation using Commercial Workloads, Feb. 2002.
 
9
B. Mathew, A. Davis, and A. Ibrahim. Perception Coprocessors for Embedded Systems. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia), October 2003.
 
10
L. W. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. In USENIX Annual Technical Conference, pages 279--294, 1996.
 
11
R. Mosur. Efficient Algorithms for Speech Recognition. PhD thesis, Carnegie Mellon University, May 1996. CMU-CS-96-143.
 
12
J. Pihl, T. Svendsen, and M. H. Johnsen. A VLSI Implementation of Pdf Computations in HMM Based Speech Recognition. In Proceedings of the IEEE Region Ten Conference on Digital Signal Processing Applications (TENCON'96), Nov. 1996.
 
13
H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Taylor. Piperench: a virtualized programmable datapath in 0.18 micron technology. In Proceedings of the IEEE Custom Integrated Circuits Conference, pages 63--66, 2002.
 
14
M. Seltzer. Sphinx iii signal processing front end specification. http://perso.enst/fr/sirocco/, May 2002.
 
15
S. Srivastava. Fast gaussian evaluations in large vocabulary continuous speech recognition. M.S. Thesis, Department of Electrical and Computer Engineering, Mississippi State University, Oct. 2002.
 
16
Y. F. Tong, R. Rutenbar, and D. Nagle. Minimizing floating-point power dissipation via bit-width reduction. In Proceedings of the 1998 International Symposium on Computer Architecture Power Driven Microarchitecture Workshop, 1998.
 
17
S. Young. Large vocabulary continuous speech recognition: A review. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 3--28, Dec. 1995.

CITED BY  6

Collaborative Colleagues:
Binu Mathew: colleagues
Al Davis: colleagues
Zhen Fang: colleagues