ACM Home Page
Please provide us with feedback. Feedback
Multimodal human discourse: gesture and speech
Full text PdfPdf (3.01 MB)
Source ACM Transactions on Computer-Human Interaction (TOCHI) archive
Volume 9 ,  Issue 3  (September 2002) table of contents
Pages: 171 - 193  
Year of Publication: 2002
ISSN:1073-0516
Authors
Francis Quek  Wright State University, Dayton, OH
David McNeill  University of Chicago
Robert Bryll  Wright State University, Dayton, OH
Susan Duncan  Wright State University, University of Chicago
Xin-Feng Ma  University of Illinois at Chicago
Cemil Kirbas  Wright State University
Karl E. McCullough  University of Chicago
Rashid Ansari  University of Illinois at Chicago
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 271,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/568513.568514
What is a DOI?

ABSTRACT

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the gesture research done to date, and present our work on the cross-modal cues for discourse segmentation in free-form gesticulation accompanying speech in natural conversation as a new paradigm for such multimodal interaction. The basis for this integration is the psycholinguistic concept of the coequal generation of gesture and speech from the same semantic intent. We present a detailed case study of a gesture and speech elicitation experiment in which a subject describes her living space to an interlocutor. We perform two independent sets of analyses on the video and audio data: video and audio analysis to extract segmentation cues, and expert transcription of the speech and gesture data by microanalyzing the videotape using a frame-accurate videoplayer to correlate the speech with the gestural entities. We compare the results of both analyses to identify the cues accessible in the gestural and audio data that correlate well with the expert psycholinguistic analysis. We show that "handedness" and the kind of symmetry in two-handed gestures provide effective supersegmental discourse cues.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ansari, R., Dai, Y., Lou, J., McNeill, D., and Quek, F. 1999. Representation of prosodic structure in speech using nonlinear methods. In Workshop on Nonlinear Signal & Image Processing (Antalya, TU, June 20--23).
 
2
 
3
Boersma, P. and Weenik, D. 1996. Praat, a system for doing phonetics by computer. Tech. Rep. 132, Institute of Phonetic Sciences of the University of Amsterdam.
4
5
 
6
brittanica.com. Encyclopaedia brittanica web site. http://www.brittanica.com.
7
 
8
 
9
 
10
 
11
 
12
 
13
Kendon, A. 1986. Current issues in the study of gesture. In The Biological Foundations of Gestures: Motor and Semiotic Aspects, J.-L. Nespoulous, P. Peron, and A. Lecours, Eds., Lawrence Erlbaum, Hillsdale, N.J., 23--47.
 
14
 
15
 
16
Ladd, D. 1996. Intonational Phonology. Cambridge University Press, Cambridge.
 
17
Lanitis, A., Taylor, C., Cootes, T., and Ahmed, T. 1995. Automatic interpretation of human faces and hand gestures. In Proceedings of the International Workshop on Automatic Face & Gesture Recognition (Zurich) 98--103.
 
18
McNeill, D. 1992. Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press, Chicago.
 
19
McNeill, D. 2000a. Catchments and context: Non-modular factors in speech and gesture. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, Cambridge, Chapter 15, 312--328.
 
20
McNeill, D. 2000b. Growth points, catchments, and contexts. Cogn. Stud. Bull. Japan. Cogn. Sci. Soc. 7, 1.
 
21
McNeill, D. and Duncan, S. 2000. Growth points in thinking-for-speaking. In Language and Gesture, D. McNeill, Ed., Cambridge University Press, Cambridge, Chapter 7, 141--161.
 
22
McNeill, D., Quek, F., McCullough, K.-E., Duncan, S., Furuyama, N., Bryll, R., Ma, X.-F., and Ansari, R. 2001. Catchments, prosody and discourse. Gesture in press.
 
23
Nakatani, C., Grosz, B., Ahn, D., and Hirschberg, J. 1995. Instructions for annotating discourses. Tech. Rep. TR-21-95, Center for Research in Computer Technology, Harvard University, Cambridge, Mass.
 
24
 
25
26
27
28
 
29
 
30
 
31
Prillwitz, S., Leven, R., Zienert H., Hanke, T., and Henning, J. 1989. Hamburg Notation System for Sign Languages---An Introductory Guide. Signum, Hamburg.
 
32
Quek, F. 1995. Eyes in the interface. Int. J. Image Vis. Comput. 13, 6 (Aug.), 511--525.
 
33
 
34
 
35
Quek, F. and McNeill, D. 2000. A multimedia system for temporally situated perceptual psycholinguistic analysis. In Measuring Behavior 2000, Nijmegen, NL, 257.
 
36
 
37
 
38
Quek, F., Yarger, R., Hachiahmetoglu, Y., Ohya, J., Shinjiro, K., Nakatsu., and McNeill, D. 2000. Bunshin: A believable avatar surrogate for both scripted and on-the-fly pen-based control in a presentation environment. In Emerging Technologies, SIGGRAPH 2000 (New Orleans) 187 (abstract) and CD--ROM (full paper).
 
39
Schlenzig, J., Hunter, E., and Jain, R. 1994. Recursive identification of gesture inputs using hidden Markov models. In Proceedings of the Second IEEE Workshop on Applications of Computer Vision (Pacific Grove, Calif.).
 
40
Sowa, T. and Wachsmuth, I. 1999. Understanding coverbal dimensional gestures in a virtual design environment. In Proceedings of the ESCA Workshop on Interactive Dialogue in Multi-Modal Systems, P. Dalsgaard, C.-H. Lee, P. Heisterkamp, and R. Cole, Eds., Kloster Irsee, Germany, 117--120.
 
41
Sowa, T. and Wachsmuth, I. 2000. Coverbal iconic gestures for object descriptions in virtual environments: An empirical study. In Post-Proceedings of the Conference of Gestures: Meaning and Use (Porto, Portugal).
 
42
43
 
44
 
45
 
46
Yamato, J., Ohya, J., and Ishii, K. 1992. Recognizing human action in time-sequential images using hidden Markov model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 379--385.

CITED BY  18

Collaborative Colleagues:
Francis Quek: colleagues
David McNeill: colleagues
Robert Bryll: colleagues
Susan Duncan: colleagues
Xin-Feng Ma: colleagues
Cemil Kirbas: colleagues
Karl E. McCullough: colleagues
Rashid Ansari: colleagues