| On compensating the Mel-frequency cepstral coefficients for noisy speech recognition |
| Full text |
Pdf
(153 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 171
archive
Proceedings of the 29th Australasian Computer Science Conference - Volume 48
table of contents
Hobart, Australia
Pages: 49 - 54
Year of Publication: 2006
ISBN ~ ISSN:1445-1336 , 1-920682-30-9
|
|
Author
|
|
Eric H. C. Choi
|
Interfaces, Machines and Graphic Environments (IMAGEN), National ICT Australia, Alexandria, NSW, Sydney, Australia
|
|
| Publisher |
Australian Computer Society, Inc.
Darlinghurst, Australia, Australia
|
| Bibliometrics |
Downloads (6 Weeks): 22, Downloads (12 Months): 61, Citation Count: 0
|
|
|
ABSTRACT
This paper describes a novel noise-robust automatic speech recognition (ASR) front-end that employs a combination of Mel-filterbank output compensation and cumulative distribution mapping of cepstral coefficients with truncated Gaussian distribution. Recognition experiments on the Aurora II connected digits database reveal that the proposed front-end achieves an average digit recognition accuracy of 84.92% for a model set trained from clean speech data. Compared with the ETSI standard Mel-cepstral front-end, the proposed front-end is found to obtain a relative error rate reduction of around 61%. Moreover, the proposed front-end can provide comparable recognition accuracy with the ETSI advanced front-end, at less than half the computation load.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Choi, E. (2004): Noise Robust Front-end for ASR using Spectral Subtraction, Spectral Flooring and Cumulative Distribution Mapping. Proc. 10th Australian Int. Conf. on Speech Science and Technology, pp. 451-456.
|
 |
2
|
|
| |
3
|
Dharanipragada, S. and Padmanabhan, M. (2000): A Nonlinear Unsupervised Adaptation Technique for Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, Vol. 4, pp. 556-559.
|
| |
4
|
Ephraim, Y. (1992): A Bayesian Estimation Approach for Speech Enhancement Using Hidden Markov Models. IEEE Trans. Signal Processing, Vol. 40, No. 4, pp. 725-735.
|
| |
5
|
ETSI (2000): Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithms. ETSI standard document ES 201 108.
|
| |
6
|
ETSI (2002): Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithm. ETSI standard document ES 202 050.
|
| |
7
|
Hermansky, H. (1990): Perceptual Linear Predictive (PLP) Analysis of Speech. Journal Acoustical Society of America (JASA), Vol. 87 (4), pp. 1738-1752.
|
| |
8
|
Hirsch, H.G. and Pearce, D. (2000): The AURORA Experimental Framework for the Performance Evaluation of Speech Recognition Systems Under Noise Conditions. Proc. ISCA ITRW ASR2000, pp. 181-188.
|
| |
9
|
Huang, C., Wang, H. and Lee, C. (2001): An SNR-Incremental Stochastic Matching Algorithm for Noisy Speech Recognition. IEEE Trans. Speech and Audio Processing, Vol. 9, No. 8, pp. 866-873.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Sankar, A. and Lee, C.H. (1996): A Maximum Likelihood Approach to Stochastic Matching for Robust Speech Recognition. IEEE Trans. Speech and Audio Processing, Vol. 4, pp. 190-202.
|
| |
13
|
Stevens, S.S. (1957): On the Psychological Law. Psychological Review, Vol. 64, pp. 153-181.
|
| |
14
|
Vaseghi, S.V. (2000): Advanced Digital Signal Processing and Noise Reduction. Wiley Press.
|
| |
15
|
Yao, K., Paliwal, K.K. and Nakamura, S. (2001): Sequential Noise Compensation by a Sequential Kullback Proximal Algorithm. Proc. European Conf. on Speech Communication and Technology, pp. 1139-1142.
|
| |
16
|
Zhang, Z. and Furui, S. (2004): Piecewise-linear Transformation-based HMM Adaptation for Noisy Speech. Speech Communication, Vol. 42, Issue 1, pp. 43-58.
|
|