|
ABSTRACT
Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio-visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Each channel consists of 2 streams (cues), facial expressions and head movements for video and spectral and prosodic features for audio. We used decision level fusion to integrate the information from the two channels and experimented using the SUM rule and a neural network as the integration functions. The results indicate that even a simple linear function such as the SUM rule achieves very good performance in audiovisual fusion. We also experimented with different combinations of cues with the most informative being the facial expressions and the spectral features. The best combination of cues is the integration of facial expressions, spectral and prosodic features when a neural network is used as the fusion method. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves over 90% recall rate and over 80% precision.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. A. Bachorowski, M. J. Smoski, and M. J. Owren. The acoustic features of human laughter. Journal-Acoustical Society of America, 110(1):1581--1597, 2001.
|
| |
2
|
P. Boersma and D. Weenink. Praat: doing phonetics by computer (version 4.3.01) (www.praat.org)). Technical report, 2005.
|
| |
3
|
|
| |
4
|
N. Campbell, H. Kashioka, and R. Ohara. No laughing matter. In European conference on speech communciation and technology:; Interspeech, pages 465--468, 2005.
|
| |
5
|
|
| |
6
|
S. Dupont and J. Luettin. Audio-visual speech modeling for continuous speech recognition. Ieee Transactions on Multimedia, 2(3):141--151, 2000.
|
 |
7
|
|
| |
8
|
D. Gonzalez-Jimenez and J. L. Alba-Castro. Toward pose-invariant 2-d face recognition through point distribution models and facial symmetry. IEEE Transactions on Information Forensics and Security, 2(3):413--429, 2007.
|
| |
9
|
H. Hermansky. Perceptual linear predictive (plp) analysis of speech. Journal of the Acoustical Society of America, 87(4):1738--1752, 1990.
|
| |
10
|
|
| |
11
|
L. Kennedy and D. Ellis. Laughter detection in meetings. In NIST ICASSP 2004 Meeting Recognition Workshop, 2004.
|
 |
12
|
|
| |
13
|
I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, and V. Karaiskos. The ami meeting corpus. In International conference on methods and techniques in behavioral research; Proceedings of measuring behaviour 2005, pages 137--140, 2005.
|
 |
14
|
Maja Pantic , Alex Pentland , Anton Nijholt , Thomas Huang, Human computing and machine understanding of human behavior: a survey, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
[doi> 10.1145/1180995.1181044]
|
| |
15
|
I. Patras and M. Pantic. Particle filtering with factorized likelihoods for tracking facial features. In International conference on automatic face and gesture recognition, pages 97--104, 2004.
|
| |
16
|
S. Petridis and M. Pantic. Audiovisual discrimination between laughter and speech. In International Conference on Acoustics Speech and Signal Processing, pages 5117--5120, 2008.
|
| |
17
|
G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior. Recent advances in the automatic recognition of audiovisual speech. Proceedings of the Ieee, 91(9):1306--1326, 2003.
|
| |
18
|
B. Reuderink. Fusion for audio-visual laughter detection. MS Thesis, University of Twente, the Netherlands, 2007.
|
| |
19
|
J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. Facial and vocal expressions of emotion. Annual Review of Psychology, 54:329--349, 2003.
|
| |
20
|
K. Scherer. Affect bursts. In S. van Goozen, N. van de Poll, and J. Sergeant, editors, Emotions: Essays on emotion theory, pages 161--193. 1994.
|
| |
21
|
M. Schroder, D. Heylen, and I. Poggi. Perception of non-verbal emotional listener feedback. In R. Hoffmann and H. Mixdorff, editors, Speech prosody, pages 1--4, 2006.
|
| |
22
|
|
 |
23
|
|
| |
24
|
Z. Zeng, M. Pantic, G. Roisman, and T. Huang. A survey of affect recognition methods: Audio, visual and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligence, accepted for publication, 30, 2008.
|
|