|
ABSTRACT
Multimodal interfaces are being developed that permit our highly skilled and coordinated communicative behavior to control system interactions in a more transparent and flexible interface experience than ever before. As applications become more complex, a single modality alone does not permit varied users to interact effectively across different tasks and usage environments [11]. However, a flexible multimodal interface offers people the choice to use a combination of modalities, or to switch to a better-suited modality, depending on the specifics of their abilities, the task, and the usage conditions.This paper will begin by summarizing some of the primary advantages of multimodal interfaces. In particular, it will discuss the inherent flexibility of multimodal interfaces, which is a key feature that makes them suitable for universal access and mobile computing. It also will discuss the role of multimodal architectures in improving the robustness and performance stability of recognition-based systems. Data will be reviewed from two recent studies in which a multimodal architecture suppressed errors and stabilized system performance for accented nonnative speakers and during mobile use. The paper will conclude by discussing the implications of this research for designing multimodal interfaces for the elderly, as well as the need for future work in this area.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Adjoudani, A. & Benoit, C. (1995) Audio-visual speech recognition compared across two architectures, Proceedings of the Eurospeech Conference, Madrid, Spain, (2) 1563-1566.
|
 |
2
|
Philip R. Cohen , Michael Johnston , David McGee , Sharon Oviatt , Jay Pittman , Ira Smith , Liang Chen , Josh Clow, QuickSet: multimodal interaction for distributed applications, Proceedings of the fifth ACM international conference on Multimedia, p.31-40, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266328]
|
 |
3
|
H. J. Fell , H. Delta , R. Peterson , L. J. Ferrier , Z. Mooraj , M. Valleau, Using the Baby-Babble-Blanket for infants with motor problems: an empirical study, Proceedings of the first annual ACM conference on Assistive technologies, p.77-84, October 31-November 01, 1994, Marina Del Rey, California, United States
[doi> 10.1145/191028.191049]
|
 |
4
|
|
| |
5
|
Markinson, R. (1993) Personal communication, University of California at San Francisco Medical School.
|
| |
6
|
Massaro, D. W. (1996). Bimodal speech perception: A progress report, In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems and Applications, (pp. 79-101). New York: Springer Verlag.
|
 |
7
|
|
 |
8
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
 |
9
|
|
 |
10
|
|
| |
11
|
Oviatt, S. L., Cohen, P. R., Wu, L., Vergo, J., Duncan, E., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson J., & Ferro, D. (2000) Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions, Human Computer Interaction, vol. 15,26 3-322. (to be reprinted in J. Carroll (ed.) Human-Computer Interaction in the New Millennium, Addison-Wesley Press: Boston, to appear in 2001).
|
| |
12
|
Robert-Ribes, J., Schwartz, J-L., Lallouache, T., & Escudier, P. (1998) Complementarity and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise, Journal of the Acoustical Society of America, 103 (6), 3677-3689.
|
| |
13
|
Tomlinson, J., Russell, M. J., & Brooke, N. M. (1996) Integrating audio and visual information to provide highly robust speech recognition, Proceedings of the IEEE ICASSP, 821-824.
|
| |
14
|
Wilpon, J. & Jacobsen, C., A study of speech recognition for children and the elderly, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Atlanta: IEEE Press, 1996, 349-352.
|
|