ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
You're not from 'round here, are you?: naive Bayes detection of non-native utterance text
Full text Publisher SitePublisher Site PdfPdf (198 KB)
Source North American Chapter Of The Association For Computational Linguistics archive
Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 table of contents
Pittsburgh, Pennsylvania
Pages: 1 - 8  
Year of Publication: 2001
Authors
Laura Mayfield Tomokiyo  Carnegie Mellon University
Rosie Jones  Carnegie Mellon University
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 8,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1073336.1073367

ABSTRACT

Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. We demonstrate that both read and spontaneous utterances can be classified with high accuracy, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions. We also characterize part-of-speech sequences that play a role in detecting non-native speech.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In AAAI Workshop on Learning for Text Categorization.
 
2
S. P. Corder. 1967. The significance of learners' errors. International Review of Applied Linguistics, 5(4):161--170.
 
3
 
4
Michael Finke, Jürgen Fritsch, Petra Geutner, Klaus Ries, and Torsten Zeppenfeld. 1997. The Janus-RTk Switchboard/Callhome 1997 Evaluation System. In Proc. the LVCSR Hub5-e Workshop.
 
5
Pascale Fung and Wai Kat Liu. 1999. Fast Accent Identification and Accented Speech Recognition. In Proc. ICASSP.
 
6
LDC. 2000. http://www.ldc.upenn.edu.
 
7
Kai-Fu Lee. 1990. Context-dependent phonetic hidden markov models for speaker-independent continuous speech recognition. In Proc. ICASSP.
 
8
 
9
Laura Mayfield Tomokiyo and Susanne Burger. 1999. Eliciting Natural Speech from Non-Native users: Collecting Speech Data for LVCSR. In Proc. the ACL-IALL Joint Workshop in Computer-Mediated Language Assessment and Evaluation in Natural Language Processing.
 
10
Andrew Kachites McCallum. 1996. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow.
 
11
Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and classical inference: the case of the Federalist papers. Springer-Verlag.
 
12
 
13
Adwait Ratnaparkhi. 1996. A maximum entropy part-of-speech tagger. In Proc. EMNLP.
 
14
1987. Guide to SPEAK. Produced by the Test of English as a Foreign Language Program, Princeton, NJ.
 
15
Elaine Tarone. 1978. The phonology of interlanguage. In J. C. Richards, editor, Understanding Second and Foreign Language Learning: Issues and Approaches. Newbury House, Rowley, MA.
 
16
Carlos Teixeira, Isabel Trancoso, and António Serralheiro. 1996. Accent identification. In Proc. ICSLP, Philadelphia, PA.
17

Collaborative Colleagues:
Laura Mayfield Tomokiyo: colleagues
Rosie Jones: colleagues