ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Compact input coding for speech recognition by neural net (abstract)
Full text PdfPdf (102 KB)
Source ACM Annual Computer Science Conference archive
Proceedings of the 1990 ACM annual conference on Cooperation table of contents
Washington, D.C., United States
Page: 444  
Year of Publication: 1990
ISBN:0-89791-348-5
Authors
Thomas M. English  P. O. Drawer CS, Mississippi State University, Mississippi State, MS
Louis C. Boggess  P. O. Drawer CS, Mississippi State University, Mississippi State, MS
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 9,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/100348.100481
What is a DOI?

ABSTRACT

Error back-propagation has been used to train a multilayer perceptron to classify an individual's utterances of digits. The network has made no errors in classifying 300 new utterances, and has exhibited robust performance with gross variations in speaking style and signal-to-noise ratio. The novelty of the approach is the way in which temporal information is encoded for use by the perceptron. Specifically, the input is a pattern of decaying activations in a self-organized map of speech spectra. At uniform steps in time, the unit in the map that most accurately represents the current speech sound is fully activated. The activation of the unit subsequently decays exponentially, and at any time the level of activation of a unit indicates how recently its corresponding speech sound has occurred. One may imagine the map as a screen of luminescent elements and the speech signal as a point of light moving across the screen. The role of the network is to classify trajectories of digits on the screen. We have found that the rate of decay in the feature map is not an important factor in isolated digit recognition. Decay rate is expected to be a critical factor in training the perceptron to spot utterances of digits in fluent speech, however. Use of the digit-classifier in a “bootstrap” training procedure for a continuous speech recognizer is proposed.


Collaborative Colleagues:
Thomas M. English: colleagues
Louis C. Boggess: colleagues