ACM Home Page
Please provide us with feedback. Feedback
Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures
Full text PdfPdf (530 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
POSTER SESSION: Poster session 1 table of contents
Pages: 175 - 182  
Year of Publication: 2004
ISBN:1-58113-995-0
Authors
Hartwig Holzapfel  Universität Karlsruhe (TH), Germany
Kai Nickel  Universität Karlsruhe (TH), Germany
Rainer Stiefelhagen  Universität Karlsruhe (TH), Germany
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 82,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027964
What is a DOI?

ABSTRACT

This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T. Asfour, A. Ude, K.Berns, and R. Dillmann. Control of armar for the realization of anthropomorphic motion patterns. In The second IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2001), pages 22--24, 2001.
2
 
3
 
4
 
5
 
6
M. Denecke. Object-oriented techniques in grammar and ontology specification. In The Workshop on Multilingual Speech Communication, pages 59--64, Kyoto, Japan, 2000.
 
7
 
8
J. Eisenstein and C. M. Christoudias. A salience-based approach to gesture-speech alignment. In Proceedings of the Human Language Technology conference / North American chapter of the Association for Computational Linguistics annual meeting, 2004.
 
9
 
10
C. Fuegen, H. Holzapfel, and A. Waibel. Tight coupling of speech recognition and dialog management -- dialog-context dependent grammar weighting for speech recognition. In Proceedings of the International Conference on Spoken Language Processing, 2004.
 
11
P. Gieselmann, C. Fuegen, H. Holzapfel, T. Schaaf, and A. Waibel. Towards multimodal communication with a household robot. In Proceedings of the International Conference on Humanoid Robots, 2003.
 
12
Third IEEE International Conference on Humanoid Robots - Humanoids, Karlsruhe, Germany, 2003.
 
13
IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
 
14
 
15
 
16
17
 
18
 
19
S. C. Levinson. Pragmatics. Cambridge, England: Cambridge University, 1983.
20
21
22
 
23
R. Sharma, V. Pavlovic, and T. Huang. Toward multimodal human-computer interface. In Proceedings of the IEEE, volume~86, pages 853 -- 869, May 1998.
 
24
H. Soltau, F. Metze, C. Fuegen, and A. Waibel. A one pass- decoder based on polymorphic linguistic context assignment. In Proceedings of the Automatic Speech Recognition and Understanding Workshop, ASRU-2001, Madonna di Campiglio, Trento, Italy, December 2001.
 
25
R. Stiefelhagen, C. Fugen, P. Gieselmann, H. Holzapfel, K. Nickel, and A. Waibel. Natural human-robot interaction using speech, gaze and gestures. In Proceedings of the International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
 
26
 
27
L. Wu, S. L. Oviatt, and P. R. Cohen. Multimodal integration - a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, 1999.

CITED BY  7

Collaborative Colleagues:
Hartwig Holzapfel: colleagues
Kai Nickel: colleagues
Rainer Stiefelhagen: colleagues