ACM Home Page
Please provide us with feedback. Feedback
Linguistic theories in efficient multimodal reference resolution: an empirical investigation
Full text PdfPdf (273 KB)
Source International Conference on Intelligent User Interfaces archive
Proceedings of the 10th international conference on Intelligent user interfaces table of contents
San Diego, California, USA
SESSION: Long papers: multimodal interaction table of contents
Pages: 43 - 50  
Year of Publication: 2005
ISBN:1-58113-894-6
Authors
Joyce Y. Chai  Michigan State University, East Lansing, MI
Zahar Prasov  Michigan State University, East Lansing, MI
Joseph Blaim  Michigan State University, East Lansing, MI
Rong Jin  Michigan State University, East Lansing, MI
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 34,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1040830.1040850
What is a DOI?

ABSTRACT

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Chai, J., Hong, P., Zhou, M. X., and Prasov, Z. 2004c. Optimization in Multimodal Interpretation. In Proceedings of ACL, 2004, pp. 1--8. Barcelona, Spain.
 
2
Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn).
3
 
4
 
5
6
 
7
Grice, H. P. Logic and Conversation. In Cole, P., and Morgan, J., eds. Speech Acts. New York, New York: Academic Press. 41--58. 1975.
 
8
 
9
Gundel, J. K., Hedberg, N., and Zacharski, R. Cognitive Status and the Form of Referring Expressions in Discourse. Language 69(2):274--307. 1993.
 
10
 
11
 
12
 
13
 
14
15
16
17
18
 
19
 
20
Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.

CITED BY  7

Collaborative Colleagues:
Joyce Y. Chai: colleagues
Zahar Prasov: colleagues
Joseph Blaim: colleagues
Rong Jin: colleagues