|
ABSTRACT
As a new generation of multimodal/media systems begins to define
itself, researchers are attempting to learn how to combine
different modes into strategically integrated whole systems. In
theory, well designed multimodal systems should be able to
integrate complementary modalities in a manner that supports mutual
disambiguation (MD) of errors and leads to more robust performance.
In this study, over 2,000 multimodal utterances by both native and
accented speakers of English were processed by a multimodal system,
and then logged and analyzed. The results confirmed that multimodal
systems can indeed support significant levels of MD, and also
higher levels of MD for the more challenging accented users. As a
result, although speech recognition as a stand-alone performed far
more poorly for accented speakers, their multimodal recognition
rates did not differ from those of native speakers. Implications
are discussed for the development of future multimodal
architectures that can perform in a more robust and stable manner
than individual recognition technologies. Also discussed is the
design of interfaces that support diversity in tangible ways, and
that function well under challenging real-world usage
conditions,
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Clow, J. & Oviatt, S. L. STAMP: A suite of tools for analyzing multimodal system processing, Proceedings of the International Conference on Spoken Language Processing, in press.
|
 |
4
|
|
 |
5
|
Philip R. Cohen , Michael Johnston , David McGee , Sharon Oviatt , Jay Pittman , Ira Smith , Liang Chen , Josh Clow, QuickSet: multimodal interaction for distributed applications, Proceedings of the fifth ACM international conference on Multimedia, p.31-40, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266328]
|
| |
6
|
Michael Johnston , Philip R. Cohen , David McGee , Sharon L. Oviatt , James A. Pittman , Ira Smith, Unification-based multimodal integration, Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, p.281-288, July 07-12, 1997, Madrid, Spain
|
| |
7
|
David B. Koons , Carlton J. Sparrell , Kristinn R. Thorisson, Integrating simultaneous input from speech, gaze, and hand gestures, Intelligent multimedia interfaces, American Association for Artificial Intelligence, Menlo Park, CA, 1993
|
 |
8
|
|
 |
9
|
|
| |
10
|
Oviatt, S.L. Multimodal interactive maps: Designing for human performance, Human-Computer Interaction, 1997, 12 (1 & 2) 93-129.
|
| |
11
|
Oviatt, S.L. Pen/voice: Complementary multimodal communication, Proceedings of Speech Tech 92, New York, NY.
|
| |
12
|
Oviatt, S.L., Bernard, J. & Levow, G. Linguistic adaptations during spoken and multimodal error resolution, Language and Speech, in press.
|
| |
13
|
|
 |
14
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
| |
15
|
Oviatt, S. L. & Kuhn, K. Referential features and linguistic indirection in multimodal language, Proceedings of the International Conference on Spoken Language Processing, in press.
|
| |
16
|
Oviatt, S. L. & Olsen, E. Integration themes in multimodal human-computer interaction, Proceedings of the International Conference on Spoken Language Processing, (ed. by Shirai, Furui & Kakehi), Acoustical Society of Japan, 1994, vol. 2, 551-554.
|
CITED BY 78
|
|
|
|
|
|
|
|
Scott R. Klemmer , Anoop K. Sinha , Jack Chen , James A. Landay , Nadeem Aboobaker , Annie Wang, Suede: a Wizard of Oz prototyping tool for speech user interfaces, Proceedings of the 13th annual ACM symposium on User interface software and technology, p.1-10, November 06-08, 2000, San Diego, California, United States
|
|
|
|
|
|
Julie A. Jacko , Holly S. Vitense , Ingrid U. Scott, Perceptual impairments and computing technologies, The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francis Quek , David McNeill , Robert Bryll , Susan Duncan , Xin-Feng Ma , Cemil Kirbas , Karl E. McCullough , Rashid Ansari, Multimodal human discourse: gesture and speech, ACM Transactions on Computer-Human Interaction (TOCHI), v.9 n.3, p.171-193, September 2002
|
|
|
Batya Friedman , Peter H. Kahn, Jr., Human values, ethics, and design, The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 2002
|
|
|
|
|
|
Joyce Y. Chai , Zahar Prasov , Joseph Blaim , Rong Jin, Linguistic theories in efficient multimodal reference resolution: an empirical investigation, Proceedings of the 10th international conference on Intelligent user interfaces, January 10-13, 2005, San Diego, California, USA
|
|
|
|
|
|
David M. Krum , Olugbenga Omoteso , William Ribarsky , Thad Starner , Larry F. Hodges, Speech and gesture multimodal control of a whole Earth 3D visualization environment, Proceedings of the symposium on Data Visualisation 2002, May 27-29, 2002, Barcelona, Spain
|
|
|
Ed Kaiser , Alex Olwal , David McGee , Hrvoje Benko , Andrea Corradini , Xiaoguang Li , Phil Cohen , Steven Feiner, Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Hicham Djenidi , Amar Ramdane-Cherif , Chakib Tadj , Nicole Levy, Architectures multi-agents génériques à base de réseaux de Pétri colorés temporisés pour la fusion multimodale en entrée, Proceedings of the 14th French-speaking conference on Human-computer interaction (Conférence Francophone sur l'Interaction Homme-Machine), p.33-40, November 26-29, 2002, Poitiers, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiaohui Zhang , Atsumi Imamiya , Kentaro Go , Xiaoyang Mao, Resolving ambiguities of a gaze and speech interface, Proceedings of the 2004 symposium on Eye tracking research & applications, p.85-92, March 22-24, 2004, San Antonio, Texas
|
|
|
|
|
|
|
|
|
Qiaohui Zhang , Atsumi Imamiya , Kentaro Go , Xiaoyang Mao, Overriding errors in a speech and gaze multimodal architecture, Proceedings of the 9th international conference on Intelligent user interface, January 13-16, 2004, Funchal, Madeira, Portugal
|
|
|
Matthias Jöst , Jochen Häußler , Matthias Merdes , Rainer Malaka, Multimodal interaction for pedestrians: an evaluation study, Proceedings of the 10th international conference on Intelligent user interfaces, January 10-13, 2005, San Diego, California, USA
|
|
|
Shimei Pan , Siwei Shen , Michelle X. Zhou , Keith Houck, Two-way adaptation for robust input interpretation in practical multimodal conversation systems, Proceedings of the 10th international conference on Intelligent user interfaces, January 10-13, 2005, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stephen Voida , Mark Podlaseck , Rick Kjeldsen , Claudio Pinhanez, A study on the manipulation of 2D objects in a projector/camera-based augmented reality environment, Proceedings of the SIGCHI conference on Human factors in computing systems, April 02-07, 2005, Portland, Oregon, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kazutaka Kurihara , Masataka Goto , Jun Ogata , Takeo Igarashi, Speech pen: predictive handwriting based on ambient multimodal recognition, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada
|
|
|
Oliver Kaufmann , Andreas Lorenz , Reinhard Oppermann , Alex Schneider , Markus Eisenhauer , Andreas Zimmermann, Implicit interaction for pro-active assistance in a context-adaptive warehouse application, Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology, September 10-12, 2007, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Paulo Barthelmess , Edward Kaiser , Xiao Huang , David McGee , Philip Cohen, Collaborative multimodal photo annotation over digital paper, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
|
|
|
|
|
|
Yong Sun , Yu Shi , Fang Chen , Vera Chung, An efficient unification-based multimodal language processor in multimodal input fusion, Proceedings of the 2007 conference of the computer-human interaction special interest group (CHISIG) of Australia on Computer-human interaction: design: activities, artifacts and environments, November 28-30, 2007, Adelaide, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sharon Oviatt , Phil Cohen , Lizhong Wu , John Vergo , Lisbeth Duncan , Bernhard Suhm , Josh Bers , Thomas Holzman , Terry Winograd , James Landay , Jim Larson , David Ferro, Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions, Human-Computer Interaction, v.15 n.4, p.263-322, December 2000
|
|
|
|
|
|
|
|
|
Tim Paek , Bo Thiesson , Yun-Cheng Ju , Bongshin Lee, Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search, Proceedings of the 21st annual ACM symposium on User interface software and technology, October 19-22, 2008, Monterey, CA, USA
|
|
|
Yong Sun , Yu Shi , Fang Chen , Vera Chung, Skipping spare information in multimodal inputs during multimodal input fusion, Proceedings of the 13th international conference on Intelligent user interfaces, February 08-11, 2009, Sanibel Island, Florida, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|