|
ABSTRACT
In a multimodal conversational interface supporting speech and deictic gesture, deictic gestures on the graphical display have been traditionally used to identify user attention, for example, through reference resolution. Since the context of the identified attention can potentially constrain the associated intention, our hypothesis is that deictic gestures can go beyond attention and apply to intention recognition. Driven by this assumption, this paper systematically investigates the role of deictic gestures in intention recognition. We experiment with different model-based methods and instancebased methods to incorporate gestural information for intention recognition. We examine the effects of utilizing gestural information in two different processing stages: speech recognition stage and language understanding stage. Our empirical results have shown that utilizing gestural information improves intention recognition. The performance is further improved when gestures are incorporated in both speech recognition and language understanding stages compared to either stage alone.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
J. Chai, S. Pan, and M. Zhou. Mind: A context-based multimodal interpretation framework in conversational systems. In O. Bernsen, L. Dybkjaer, and J. van Kuppevelt, editors, Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems. Kluwer Academic Publishers, 2005.
|
| |
3
|
J. Chai, Z. Prasov, and S. Qu. Cognitive principles in robutst multimodal interpretation. Journal of Artificial Intelligence Research, 27:55--83, 2006.
|
| |
4
|
|
 |
5
|
|
| |
6
|
A. Chotimongkol and A. Rudnicky. N-best speech hypotheses reordering using linear regression. In Proceedings of 7th EUROSPEECH, pages 1829--1832, 2001.
|
 |
7
|
Philip R. Cohen , Michael Johnston , David McGee , Sharon Oviatt , Jay Pittman , Ira Smith , Liang Chen , Josh Clow, QuickSet: multimodal interaction for distributed applications, Proceedings of the fifth ACM international conference on Multimedia, p.31-40, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266328]
|
| |
8
|
|
| |
9
|
J. Eisenstein and C. M. Christoudias. A salience-based approach to gesture-speech alignment. In Proceedings of HLT/NAACL'04, 2004.
|
| |
10
|
|
| |
11
|
|
| |
12
|
A. Gruenstein, C. Wang, and S. Seneff. Context-sensitive statistical language modeling. In Proceedings of Eurospeech'05, 2005.
|
| |
13
|
J. Gustafson, L. Bell, J. Beskow, B. J., R. Carlson, J. Edlund, B. Granstrom, H. D., and M. Wiren. Adapt - a multimodal conversational dialogue system in an apartment domain. In Proceedings of 6th International Conference on Spoken Language Processing (ICSLP), 2000.
|
| |
14
|
C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13:415--425, 2002.
|
| |
15
|
|
| |
16
|
S. Iba, C. Paredis, and P. Khosla. Intention aware interactive multi-modal robot programming. In Proceedings of IEEE/RSJ International Conference on - Intelligent Robots and Systems, 2003.
|
| |
17
|
Michael Johnston , Srinivas Bangalore , Gunaranjan Vasireddy , Amanda Stent , Patrick Ehlen , Marilyn Walker , Steve Whittaker , Preetam Maloor, MATCH: an architecture for multimodal dialogue systems, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July 07-12, 2002, Philadelphia, Pennsylvania
[doi> 10.3115/1073083.1073146]
|
| |
18
|
Z. Kazi, S. Chen, M. Beitler, D. Chester, and R. Foulds. Multimodal hci for robot control: Towards an intelligent robotic assistant for people with disablities. In Proceedings of AAAI'96 Fall Symposium on Developing AI Applications for the Disabled, 1996.
|
| |
19
|
|
| |
20
|
P. Kiefer and C. Schlieder. Exploring context-sensitivity in spatial intention recognition. In Proceedings of the Workshop on Behaviour Monitoring and Interpretation (BMI'07), 2007.
|
 |
21
|
|
| |
22
|
J. G. Neal , C. Y. Thielman , Z. Dobes , S. M. Haller , S. C. Shapiro, Natural language with integrated deictic and graphic gestures, Readings in intelligent user interfaces, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998
|
| |
23
|
S. Oviatt. Mulitmodal interactive maps: Designing for human performance. Human-Computer Interaction, 12:93--129, 1997.
|
 |
24
|
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
| |
28
|
D. Roy and N. Mukherjee. Towards situated speech understanding: Visual context priming of language models. Computer Speech and Language, 19(2):227--248, 2005.
|
| |
29
|
R. A. Solsona, E. Fosler-Lussier, H.-K. J. Kuo, A. Potamianos, and I. Zitouni. Adaptive language models for spoken dialogue systems. In Proceedings of ICASSP, 2002.
|
| |
30
|
|
| |
31
|
W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf, and J. Woelfel. Sphinx-4: A flexible open source framework for speech recognition. Technical Report TR-2004-139, Sun Microsystems Laboratories, 2004.
|
| |
32
|
S.-J. Youn and K.-W. Oh. Intention recognition using a graph representation. International Journal of Applied Science, Engineering and Techcnology, 4:13--18, 2007.
|
| |
33
|
M. Zancanaro, O. Stock, and C. Strapparava. Multimodal interaction for information access: Exploiting cohesion. Computational Intelligence, 13(7):439--464, 1997.
|
|