|
ABSTRACT
Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. We demonstrate that both read and spontaneous utterances can be classified with high accuracy, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions. We also characterize part-of-speech sequences that play a role in detecting non-native speech.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In AAAI Workshop on Learning for Text Categorization.
|
| |
2
|
S. P. Corder. 1967. The significance of learners' errors. International Review of Applied Linguistics, 5(4):161--170.
|
| |
3
|
|
| |
4
|
Michael Finke, Jürgen Fritsch, Petra Geutner, Klaus Ries, and Torsten Zeppenfeld. 1997. The Janus-RTk Switchboard/Callhome 1997 Evaluation System. In Proc. the LVCSR Hub5-e Workshop.
|
| |
5
|
Pascale Fung and Wai Kat Liu. 1999. Fast Accent Identification and Accented Speech Recognition. In Proc. ICASSP.
|
| |
6
|
LDC. 2000. http://www.ldc.upenn.edu.
|
| |
7
|
Kai-Fu Lee. 1990. Context-dependent phonetic hidden markov models for speaker-independent continuous speech recognition. In Proc. ICASSP.
|
| |
8
|
|
| |
9
|
Laura Mayfield Tomokiyo and Susanne Burger. 1999. Eliciting Natural Speech from Non-Native users: Collecting Speech Data for LVCSR. In Proc. the ACL-IALL Joint Workshop in Computer-Mediated Language Assessment and Evaluation in Natural Language Processing.
|
| |
10
|
Andrew Kachites McCallum. 1996. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow.
|
| |
11
|
Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and classical inference: the case of the Federalist papers. Springer-Verlag.
|
| |
12
|
|
| |
13
|
Adwait Ratnaparkhi. 1996. A maximum entropy part-of-speech tagger. In Proc. EMNLP.
|
| |
14
|
1987. Guide to SPEAK. Produced by the Test of English as a Foreign Language Program, Princeton, NJ.
|
| |
15
|
Elaine Tarone. 1978. The phonology of interlanguage. In J. C. Richards, editor, Understanding Second and Foreign Language Learning: Issues and Approaches. Newbury House, Rowley, MA.
|
| |
16
|
Carlos Teixeira, Isabel Trancoso, and António Serralheiro. 1996. Accent identification. In Proc. ICSLP, Philadelphia, PA.
|
 |
17
|
|
CITED BY 3
|
|
|
|
|
|
|
|
John Lee , Ming Zhou , Xiaohua Liu, Detection of non-native sentences using machine-translated training data, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, p.93-96, April 22-27, 2007, Rochester, New York
|
|