|
ABSTRACT
Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the gesture research done to date, and present our work on the cross-modal cues for discourse segmentation in free-form gesticulation accompanying speech in natural conversation as a new paradigm for such multimodal interaction. The basis for this integration is the psycholinguistic concept of the coequal generation of gesture and speech from the same semantic intent. We present a detailed case study of a gesture and speech elicitation experiment in which a subject describes her living space to an interlocutor. We perform two independent sets of analyses on the video and audio data: video and audio analysis to extract segmentation cues, and expert transcription of the speech and gesture data by microanalyzing the videotape using a frame-accurate videoplayer to correlate the speech with the gestural entities. We compare the results of both analyses to identify the cues accessible in the gestural and audio data that correlate well with the expert psycholinguistic analysis. We show that "handedness" and the kind of symmetry in two-handed gestures provide effective supersegmental discourse cues.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ansari, R., Dai, Y., Lou, J., McNeill, D., and Quek, F. 1999. Representation of prosodic structure in speech using nonlinear methods. In Workshop on Nonlinear Signal & Image Processing (Antalya, TU, June 20--23).
|
| |
2
|
|
| |
3
|
Boersma, P. and Weenik, D. 1996. Praat, a system for doing phonetics by computer. Tech. Rep. 132, Institute of Phonetic Sciences of the University of Amsterdam.
|
 |
4
|
|
 |
5
|
|
| |
6
|
brittanica.com. Encyclopaedia brittanica web site. http://www.brittanica.com.
|
 |
7
|
|
| |
8
|
Philip R. Cohen , Mary Dalrymple , Douglas B. Moran , Fernando C. N. Pereira , Joseph W. Sullivan , Robert A. Gargan, Jr. , Jon L. Schlossberg , Sherman W. Tyler, Synergistic use of direct manipulation and natural language, Readings in intelligent user interfaces, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Kendon, A. 1986. Current issues in the study of gesture. In The Biological Foundations of Gestures: Motor and Semiotic Aspects, J.-L. Nespoulous, P. Peron, and A. Lecours, Eds., Lawrence Erlbaum, Hillsdale, N.J., 23--47.
|
| |
14
|
David B. Koons , Carlton J. Sparrell , Kristinn R. Thorisson, Integrating simultaneous input from speech, gaze, and hand gestures, Intelligent multimedia interfaces, American Association for Artificial Intelligence, Menlo Park, CA, 1993
|
| |
15
|
David B. Koons , Carlton J. Sparrell , Kristinn Rr. Thorisson, Integrating simultaneous input from speech, gaze, and hand gestures, Readings in intelligent user interfaces, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998
|
| |
16
|
Ladd, D. 1996. Intonational Phonology. Cambridge University Press, Cambridge.
|
| |
17
|
Lanitis, A., Taylor, C., Cootes, T., and Ahmed, T. 1995. Automatic interpretation of human faces and hand gestures. In Proceedings of the International Workshop on Automatic Face & Gesture Recognition (Zurich) 98--103.
|
| |
18
|
McNeill, D. 1992. Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press, Chicago.
|
| |
19
|
McNeill, D. 2000a. Catchments and context: Non-modular factors in speech and gesture. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, Cambridge, Chapter 15, 312--328.
|
| |
20
|
McNeill, D. 2000b. Growth points, catchments, and contexts. Cogn. Stud. Bull. Japan. Cogn. Sci. Soc. 7, 1.
|
| |
21
|
McNeill, D. and Duncan, S. 2000. Growth points in thinking-for-speaking. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, Cambridge, Chapter 7, 141--161.
|
| |
22
|
McNeill, D., Quek, F., McCullough, K.-E., Duncan, S., Furuyama, N., Bryll, R., Ma, X.-F., and Ansari, R. 2001. Catchments, prosody and discourse. Gesture in press.
|
| |
23
|
Nakatani, C., Grosz, B., Ahn, D., and Hirschberg, J. 1995. Instructions for annotating discourses. Tech. Rep. TR-21-95, Center for Research in Computer Technology, Harvard University, Cambridge, Mass.
|
| |
24
|
J. G. Neal , C. Y. Thielman , Z. Dobes , S. M. Haller , S. C. Shapiro, Natural language with integrated deictic and graphic gestures, Proceedings of the workshop on Speech and Natural Language, October 15-18, 1989, Cape Cod, Massachusetts
[doi> 10.3115/1075434.1075499]
|
| |
25
|
J. G. Neal , C. Y. Thielman , Z. Dobes , S. M. Haller , S. C. Shapiro, Natural language with integrated deictic and graphic gestures, Readings in intelligent user interfaces, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998
|
 |
26
|
|
 |
27
|
|
 |
28
|
Sharon Oviatt , Antonella DeAngeli , Karen Kuhn, Integration and synchronization of input modes during multimodal human-computer interaction, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258821]
|
| |
29
|
|
| |
30
|
|
| |
31
|
Prillwitz, S., Leven, R., Zienert H., Hanke, T., and Henning, J. 1989. Hamburg Notation System for Sign Languages---An Introductory Guide. Signum, Hamburg.
|
| |
32
|
Quek, F. 1995. Eyes in the interface. Int. J. Image Vis. Comput. 13, 6 (Aug.), 511--525.
|
| |
33
|
|
| |
34
|
|
| |
35
|
Quek, F. and McNeill, D. 2000. A multimedia system for temporally situated perceptual psycholinguistic analysis. In Measuring Behavior 2000, Nijmegen, NL, 257.
|
| |
36
|
|
| |
37
|
|
| |
38
|
Quek, F., Yarger, R., Hachiahmetoglu, Y., Ohya, J., Shinjiro, K., Nakatsu., and McNeill, D. 2000. Bunshin: A believable avatar surrogate for both scripted and on-the-fly pen-based control in a presentation environment. In Emerging Technologies, SIGGRAPH 2000 (New Orleans) 187 (abstract) and CD--ROM (full paper).
|
| |
39
|
Schlenzig, J., Hunter, E., and Jain, R. 1994. Recursive identification of gesture inputs using hidden Markov models. In Proceedings of the Second IEEE Workshop on Applications of Computer Vision (Pacific Grove, Calif.).
|
| |
40
|
Sowa, T. and Wachsmuth, I. 1999. Understanding coverbal dimensional gestures in a virtual design environment. In Proceedings of the ESCA Workshop on Interactive Dialogue in Multi-Modal Systems, P. Dalsgaard, C.-H. Lee, P. Heisterkamp, and R. Cole, Eds., Kloster Irsee, Germany, 117--120.
|
| |
41
|
Sowa, T. and Wachsmuth, I. 2000. Coverbal iconic gestures for object descriptions in virtual environments: An empirical study. In Post-Proceedings of the Conference of Gestures: Meaning and Use (Porto, Portugal).
|
| |
42
|
|
 |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
Yamato, J., Ohya, J., and Ishii, K. 1992. Recognizing human action in time-sequential images using hidden Markov model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 379--385.
|
CITED BY 13
|
|
|
|
Matthew Turk , Jeremy Bailenson , Andrew Beall , Jim Blascovich , Rosanna Guadagno, Multimodal transformed social interaction, Proceedings of the 6th international conference on Multimodal interfaces, October 13-15, 2004, State College, PA, USA
|
|
|
|
|
|
|
|
|
Ali Erol , George Bebis , Mircea Nicolescu , Richard D. Boyle , Xander Twombly, Vision-based hand pose estimation: A review, Computer Vision and Image Understanding, v.108 n.1-2, p.52-73, October, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Theory and methods
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Interaction styles (e.g., commands, menus, forms, direct manipulation)
General Terms:
Human Factors,
Languages,
Theory
Keywords:
Multimodal interaction,
conversational interaction,
discourse,
gesture,
gesture analysis,
human interaction models,
speech
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|