|
ABSTRACT
Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. While the machine interpretation of these cues has previously been limited to output modalities, recent advances in face-pose tracking allow for systems which are robust and accurate enough to sense natural grounding gestures. We present the design of a module that detects these cues and show examples of its integration in three different conversational agents with varying degrees of discourse model complexity. Using a scripted discourse model and off-the-shelf animation and speech-recognition components, we demonstrate the use of this module in a novel "conversational tooltip" task, where additional information is spontaneously provided by an animated character when users attendto various physical objects or characters in the environment. We further describe the integration of our module in two systems where animated and robotic characters interact with users based on rich discourse and semantic models.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AT&T. Natural Voices. http://www.naturalvoices.att.com.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
J. Cassell , T. Bickmore , M. Billinghurst , L. Campbell , K. Chang , H. Vilhjálmsson , H. Yan, Embodiment in conversational interfaces: Rea, Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, p.520-527, May 15-20, 1999, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/302979.303150]
|
 |
6
|
|
| |
7
|
|
| |
8
|
V. Design. MEGA-D Megapixel Digital Stereo Head. http://www.ai.sri.com/ konolige/svs/, 2000.
|
| |
9
|
|
| |
10
|
Haptek. Haptek Player. http://www.haptek.com.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
Christopher Lee , Neal Lesh , Candace L. Sidner , Louis-Philippe Morency , Ashish Kapoor , Trevor Darrell, Nodding in conversations with a robot, CHI '04 extended abstracts on Human factors in computing systems, April 24-29, 2004, Vienna, Austria
[doi> 10.1145/985921.985935]
|
| |
15
|
L.-P. Morency and T. Darrell. Stereo tracking using ICP and normal flow. In Proceedings International Conference on Pattern Recognition, 2002.
|
| |
16
|
|
| |
17
|
L.-P. Morency, A. Rahimi, and T. Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2003.
|
| |
18
|
K. Murphy. Bayes Net Toolbox for Matlab. http://www.ai.mit.edu/ murphyk/Software/BNT/bnt.html.
|
| |
19
|
Yukiko I. Nakano , Gabe Reinstein , Tom Stocky , Justine Cassell, Towards a model of face-to-face grounding, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, p.553-561, July 07-12, 2003, Sapporo, Japan
[doi> 10.3115/1075096.1075166]
|
| |
20
|
Nuance. Nuance. http://www.nuance.com.
|
| |
21
|
J. Pierrehumbert. The phonology and phonetic of English intonation. Massachusetts Institute of Technology, 1980.
|
| |
22
|
|
| |
23
|
A. Schodl, A. Haro, and I. Essa. Head tracking using a textured polygonal model. In PUI98, 1998.
|
 |
24
|
Candace L. Sidner , Cory D. Kidd , Christopher Lee , Neal Lesh, Where to look: a study of human-robot engagement, Proceedings of the 9th international conference on Intelligent user interface, January 13-16, 2004, Funchal, Madeira, Portugal
[doi> 10.1145/964442.964458]
|
| |
25
|
C. Sidner, C. Lee, and N. Lesh. Engagement when looking: Behaviors for robots when collaborating with people. In Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue, pages 123--130, University of Saarland, 2003. I. Kruiff-Korbayova and C. Kosny (eds.).
|
 |
26
|
Michael Siracusa , Louis-Philippe Morency , Kevin Wilson , John Fisher , Trevor Darrell, A multi-modal approach for determining speaker location and focus, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
[doi> 10.1145/958432.958449]
|
| |
27
|
|
 |
28
|
|
| |
29
|
P. Viola and M. Jones. Robust real-time face detection. In ICCV, page II: 747, 2001.
|
| |
30
|
|
| |
31
|
|
CITED BY 2
|
|
Candace L. Sidner , Christopher Lee , Louis-Philippe Morency , Clifton Forlines, The effect of head-nod recognition in human-robot conversation, Proceeding of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, March 02-03, 2006, Salt Lake City, Utah, USA
|
|
|
|
|