ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
A maximum entropy based approach for multimodal integration
Full text PdfPdf (155 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
DEMONSTRATION SESSION: Demo session 2 table of contents
Pages: 337 - 338  
Year of Publication: 2004
ISBN:1-58113-995-0
Author
Péter Pál Boda  Nokia Research Center, Helsinki, Finland
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027996
What is a DOI?

ABSTRACT

Integration of various user input channels for a multimodal interface is not just an engineering problem. To fully understand users in the context of an application and the current session, solutions are sought that process information from different intentional, i.e. user-originated, as well as from passively available sources in a uniform manner. As a first step towards this goal, the work demonstrated here investigates how intentional user input (e.g. speech, gesture) can be seamlessly combined to provide a single semantic interpretation of the user input. For this classical Multimodal Integration problem the Maximum Entropy approach is demonstrated with 76.52% integration accuracy for the 1st and 86.77% accuracy for the top 3-best candidates. The paper also exhibits the process that generates multimodal data for training the statistical integrator, using transcribed speech from MIT's Voyager application. The quality of the generated data is assessed by comparing to real inputs to the multimodal version of Voyager.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Boda, P. P. and Filisko, E. Virtual Modality: a Framework for Testing and Building Multimodal Applications. HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems, Boston, Massachusetts, USA, May 7, 2004.
 
3
Boda, P. P. Multimodal Integration in a Wider Sense. COLING 2004 Satellite Workshop on Robust and Adaptive Information Processing for Mobile Speech Interfaces. Geneva, Switzerland, August 28 -- 29, 2004
 
4
 
5
Wang, S. B. A Multimodal Galaxy-based Geographic System. S.M. thesis, MIT Department of Electrical Engineering and Computer Science. 2003.