|
ABSTRACT
This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
T. Asfour, A. Ude, K.Berns, and R. Dillmann. Control of armar for the realization of anthropomorphic motion patterns. In The second IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2001), pages 22--24, 2001.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
M. Denecke. Object-oriented techniques in grammar and ontology specification. In The Workshop on Multilingual Speech Communication, pages 59--64, Kyoto, Japan, 2000.
|
| |
7
|
|
| |
8
|
J. Eisenstein and C. M. Christoudias. A salience-based approach to gesture-speech alignment. In Proceedings of the Human Language Technology conference / North American chapter of the Association for Computational Linguistics annual meeting, 2004.
|
| |
9
|
M. Finke , P. Geutner , H. Hild , T. Kemp , K. Ries , M. Westphal, The Karlsruhe-Verbmobil Speech Recognition Engine, Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1, p.83, April 21-24, 1997
|
| |
10
|
C. Fuegen, H. Holzapfel, and A. Waibel. Tight coupling of speech recognition and dialog management -- dialog-context dependent grammar weighting for speech recognition. In Proceedings of the International Conference on Spoken Language Processing, 2004.
|
| |
11
|
P. Gieselmann, C. Fuegen, H. Holzapfel, T. Schaaf, and A. Waibel. Towards multimodal communication with a household robot. In Proceedings of the International Conference on Humanoid Robots, 2003.
|
| |
12
|
Third IEEE International Conference on Humanoid Robots - Humanoids, Karlsruhe, Germany, 2003.
|
| |
13
|
IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
|
| |
14
|
|
| |
15
|
Michael Johnston , Srinivas Bangalore , Gunaranjan Vasireddy , Amanda Stent , Patrick Ehlen , Marilyn Walker , Steve Whittaker , Preetam Maloor, MATCH: an architecture for multimodal dialogue systems, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July 07-12, 2002, Philadelphia, Pennsylvania
[doi> 10.3115/1073083.1073146]
|
| |
16
|
Michael Johnston , Philip R. Cohen , David McGee , Sharon L. Oviatt , James A. Pittman , Ira Smith, Unification-based multimodal integration, Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, p.281-288, July 07-12, 1997, Madrid, Spain
|
 |
17
|
Manpreet Kaur , Marilyn Tremaine , Ning Huang , Joseph Wilder , Zoran Gacovski , Frans Flippo , Chandra Sekhar Mantravadi, Where is "it"? Event Synchronization in Gaze-Speech Input Systems, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
[doi> 10.1145/958432.958463]
|
| |
18
|
|
| |
19
|
S. C. Levinson. Pragmatics. Cambridge, England: Cambridge University, 1983.
|
 |
20
|
|
 |
21
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
 |
22
|
Norbert Reithinger , Jan Alexandersson , Tilman Becker , Anselm Blocher , Ralf Engel , Markus Löckelt , Jochen Müller , Norbert Pfleger , Peter Poller , Michael Streit , Valentin Tschernomas, SmartKom: adaptive and flexible multimodal access to multiple applications, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
[doi> 10.1145/958432.958454]
|
| |
23
|
R. Sharma, V. Pavlovic, and T. Huang. Toward multimodal human-computer interface. In Proceedings of the IEEE, volume~86, pages 853 -- 869, May 1998.
|
| |
24
|
H. Soltau, F. Metze, C. Fuegen, and A. Waibel. A one pass- decoder based on polymorphic linguistic context assignment. In Proceedings of the Automatic Speech Recognition and Understanding Workshop, ASRU-2001, Madonna di Campiglio, Trento, Italy, December 2001.
|
| |
25
|
R. Stiefelhagen, C. Fugen, P. Gieselmann, H. Holzapfel, K. Nickel, and A. Waibel. Natural human-robot interaction using speech, gaze and gestures. In Proceedings of the International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
|
| |
26
|
|
| |
27
|
L. Wu, S. L. Oviatt, and P. R. Cohen. Multimodal integration - a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, 1999.
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yong Sun , Helmut Prendinger , Yu (David) Shi , Fang Chen , Vera Chung , Mitsuru Ishizuka, The hinge between input and output: understanding the multimodal input fusion results in an agent-based multimodal presentation system, CHI '08 extended abstracts on Human factors in computing systems, April 05-10, 2008, Florence, Italy
|
|
|
|
|