ACM Home Page
Please provide us with feedback. Feedback
Mutual disambiguation of recognition errors in a multimodel architecture
Full text PdfPdf (1.21 MB)
Source Conference on Human Factors in Computing Systems archive
Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit table of contents
Pittsburgh, Pennsylvania, United States
Pages: 576 - 583  
Year of Publication: 1999
ISBN:0-201-48559-1
Author
Sharon Oviatt  Center for Human-Computer Communication, Oregon Graduate Institute of Science and Technology, P.O. Box 91000, Portland, OR
Sponsor
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 52,   Citation Count: 78
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/302979.303163
What is a DOI?

ABSTRACT

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions,


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Clow, J. & Oviatt, S. L. STAMP: A suite of tools for analyzing multimodal system processing, Proceedings of the International Conference on Spoken Language Processing, in press.
4
5
 
6
 
7
8
9
 
10
Oviatt, S.L. Multimodal interactive maps: Designing for human performance, Human-Computer Interaction, 1997, 12 (1 & 2) 93-129.
 
11
Oviatt, S.L. Pen/voice: Complementary multimodal communication, Proceedings of Speech Tech 92, New York, NY.
 
12
Oviatt, S.L., Bernard, J. & Levow, G. Linguistic adaptations during spoken and multimodal error resolution, Language and Speech, in press.
 
13
14
 
15
Oviatt, S. L. & Kuhn, K. Referential features and linguistic indirection in multimodal language, Proceedings of the International Conference on Spoken Language Processing, in press.
 
16
Oviatt, S. L. & Olsen, E. Integration themes in multimodal human-computer interaction, Proceedings of the International Conference on Spoken Language Processing, (ed. by Shirai, Furui & Kakehi), Acoustical Society of Japan, 1994, vol. 2, 551-554.

CITED BY  78