| Linguistic theories in efficient multimodal reference resolution: an empirical investigation |
| Full text |
Pdf
(273 KB)
|
| Source
|
International Conference on Intelligent User Interfaces
archive
Proceedings of the 10th international conference on Intelligent user interfaces
table of contents
San Diego, California, USA
SESSION: Long papers: multimodal interaction
table of contents
Pages: 43 - 50
Year of Publication: 2005
ISBN:1-58113-894-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 34, Citation Count: 7
|
|
|
ABSTRACT
Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Chai, J., Hong, P., Zhou, M. X., and Prasov, Z. 2004c. Optimization in Multimodal Interpretation. In Proceedings of ACL, 2004, pp. 1--8. Barcelona, Spain.
|
| |
2
|
Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn).
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
Philip R. Cohen , Michael Johnston , David McGee , Sharon Oviatt , Jay Pittman , Ira Smith , Liang Chen , Josh Clow, QuickSet: multimodal interaction for distributed applications, Proceedings of the fifth ACM international conference on Multimedia, p.31-40, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266328]
|
| |
7
|
Grice, H. P. Logic and Conversation. In Cole, P., and Morgan, J., eds. Speech Acts. New York, New York: Academic Press. 41--58. 1975.
|
| |
8
|
|
| |
9
|
Gundel, J. K., Hedberg, N., and Zacharski, R. Cognitive Status and the Form of Referring Expressions in Discourse. Language 69(2):274--307. 1993.
|
| |
10
|
|
| |
11
|
Michael Johnston , Philip R. Cohen , David McGee , Sharon L. Oviatt , James A. Pittman , Ira Smith, Unification-based multimodal integration, Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, p.281-288, July 07-12, 1997, Madrid, Spain
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Edward C. Kaiser , Paulo Barthelmess , Candice Erdmann , Phil Cohen, Multimodal redundancy across handwriting and speech during computer mediated human-human interactions, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
Barbara Di Eugenio , Davide Fossati , Susan Haller , Dan Yu , Michael Glass, Be Brief, And They Shall Learn: Generating Concise Language Feedback for a Computer Tutor, International Journal of Artificial Intelligence in Education, v.18 n.4, p.317-345, December 2008
|
|
|
|
|