|
ABSTRACT
Although gesture recognition has been studied extensively, communicative, affective, and biometrical "utility" of natural gesticulation remains relatively unexplored. One of the main reasons for that is the modeling complexity of spontaneous gestures. While lexical information in speech provides additional cues for disambiguating gestures, it does not cover rich paralinguistic domain. This paper offers initial findings from a large corpus of natural monologues about prosodic structuring between frequent beat-like strokes and concurrent speech. Using a set of audio-visual features in an HMM-based formulation, we are able to improve the discrimination between visually similar gestures. Those types of articulatory strokes represent different communicative functions. The analysis is based on the temporal alignment of detected vocal perturbations and the concurrent hand movement. As a supplementary result, we show that recognized articulatory strokes may be used for quantifying gesturing behavior.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
M. Yeasin and S. Chaudhuri, "Visual understanding of dynamic hand gestures," Pattern Recognition, vol. 33, pp. 1805--1817, 2000.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
A. Kendon, "Gesticulation and speech: Two aspects of the process of the utterance," in The relation between verbal and non-verbal communication, M. R. Key, Ed. Hague: Mouton, 1980, pp. 207-227.
|
| |
10
|
D. McNeill, Hand and Mind: The University of Chicago Press, Chicago IL, 1992.
|
| |
11
|
|
| |
12
|
|
 |
13
|
Ingmar Rauschert , Pyush Agrawal , Rajeev Sharma , Sven Fuhrmann , Isaac Brewer , Alan MacEachren, Designing a human-centered, multimodal GIS interface to support emergency management, Proceedings of the 10th ACM international symposium on Advances in geographic information systems, November 08-09, 2002, McLean, Virginia, USA
[doi> 10.1145/585147.585172]
|
| |
14
|
R. Sharma, M. Yeasin, N. Krahnstoever, I. Rauschert, G. Cai, A. MacEachren, K. Sengupta, and I. Brewer, "Speech-Gesture Driven Multimodal Interfaces for Crisis Management," Proceedings of IEEE special issue on Multimodal Human-Computer Interface, 2003.
|
 |
15
|
Ed Kaiser , Alex Olwal , David McGee , Hrvoje Benko , Andrea Corradini , Xiaoguang Li , Phil Cohen , Steven Feiner, Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
[doi> 10.1145/958432.958438]
|
| |
16
|
D. McNeill, "Gesture and Language Dialectic," Acta Linguistica Hafniesia, 2002.
|
| |
17
|
M. W. Alibali, S. Kita, and A.J.Young, "Gesture and the process of speech production: We think, therefore we gesture," Language and cognitive processes, vol. 15, pp. 593--613, 2000.
|
| |
18
|
D. F. Armstrong, W. C. Stokoe, and S. E. Wilcox, Gesture and the Nature of Language: Cambridge University Press, 1995.
|
| |
19
|
A. Kendon, "Do gestures communicate?: A review," Research on Language and Social Interaction, vol. 27, pp. 175--200, 1994.
|
| |
20
|
|
 |
21
|
Sharon Oviatt , Rachel Coulston , Stefanie Tomko , Benfang Xiao , Rebecca Lunsford , Matt Wesson , Lesley Carmichael, Toward a theory of organized multimodal integration patterns during human-computer interaction, Proceedings of the 5th international conference on Multimodal interfaces, November 05-07, 2003, Vancouver, British Columbia, Canada
[doi> 10.1145/958432.958443]
|
| |
22
|
S. Kettebekov, M. Yeasin, and R. Sharma, "Improving Continuous Gesture Recognition with Spoken Prosody," In proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03), Madison, Wisconsin, 2003, vol. 1, pp. 565--570.
|
| |
23
|
|
| |
24
|
M. E. Beckman, "The parsing of prosody," Language and Cognitive Processes, vol. 11, pp. 17--67, 1996.
|
| |
25
|
B. Butterworth and U. Hadar, "Gesture, speech, and computational stages: A reply to McNeill," Psychological Review, vol. 96, pp. 168--174, 1989.
|
| |
26
|
R. M. Krauss, Y. Chen, and P. Chawla, "Nonverbal behavior and nonverbal communication: What do conversational hand gestures tell us?," in Advances in experimental social psychology, M. Zanna, Ed. San Diego, CA: Academic Press, 1996, pp. 389--450.
|
| |
27
|
R. M. Krauss, "Why do we gesture when we speak?," Current Directions in Psychological Science, vol. 7, pp. 54--59, 1998.
|
| |
28
|
J.-P. de Ruiter, "Gesture and speech production," in Series in Psycholinguistics. Nijmegen, The Netherlands: MPI, 1998.
|
| |
29
|
D. McNeill, "Gesture and Language Dialectic," Acta Linguistica Hafniensia, 2002.
|
| |
30
|
|
| |
31
|
F. Quek and Y. Xiong, "Oscillatory Gestures and Discourse," In proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, 2002.
|
| |
32
|
S. Kettebekov, M. Yeasin, N. Krahnstoever, and R. Sharma, "Prosody Based Co-analysis of Deictic Gestures and Speech in Weather Narration Broadcast," In proc. of Workshop on Multimodal Resources and Multimodal System Evaluation. (LREC 2002), Las Palmas, Spain, 2002, pp. 57--62.
|
| |
33
|
|
| |
34
|
|
| |
35
|
P. Boersma and D. Weenink, "PRAAT," 4.0 ed. Amsterdam, NL: Institute of Phonetic Sciences. University of Amsterdam, NL, 2002.
|
| |
36
|
J. N. Holmes, "Mechanisms and Models of Human Speech Production," in Speech Synthesis and Recognition Aspects of Information Technology. Berkshire: Van Nostrand Reinhold, 1988.
|
| |
37
|
I. Lehiste, Suprasegmentals. Cambridge, Massachusetts: MIT Press, 1970.
|
| |
38
|
J. E. Atkinson, "Correlation Analysis of the Physiological Factors Controlling Fundamental Voice Frequency," Journal of the Acoustical Society of America, vol. 63, pp. 211--222, 1978.
|
| |
39
|
J. Godfrey and J. N. Brodsky, "Acoustic Correlates of Emphasis," Journal of the Acoustical Society of America, vol. 80, 1986.
|
| |
40
|
I. Titze, Principles of Voice Production. Englewood Cliffs: Prentice-Hall, 1994.
|
| |
41
|
A. Adami, "Modeling Prosodic Differences for Speaker and Language Recognition," OGI School of Science and Engineering: Oregon Health and Science University, Beaverton, OR, Doctorate Thesis, 2004.
|
| |
42
|
|
|