ACM Home Page
Please provide us with feedback. Feedback
Fusion of audio and visual cues for laughter detection
Full text PdfPdf (628 KB)
Source
Conference On Image And Video Retrieval archive
Proceedings of the 2008 international conference on Content-based image and video retrieval table of contents
Niagara Falls, Canada
POSTER SESSION: Poster/reception table of contents
Pages 329-338  
Year of Publication: 2008
ISBN:978-1-60558-070-8
Authors
Stavros Petridis  Imperial College, London, United Kngdm
Maja Pantic  Imperial College, London, United Kngdm
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 105,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1386352.1386396
What is a DOI?

ABSTRACT

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio-visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Each channel consists of 2 streams (cues), facial expressions and head movements for video and spectral and prosodic features for audio. We used decision level fusion to integrate the information from the two channels and experimented using the SUM rule and a neural network as the integration functions. The results indicate that even a simple linear function such as the SUM rule achieves very good performance in audiovisual fusion. We also experimented with different combinations of cues with the most informative being the facial expressions and the spectral features. The best combination of cues is the integration of facial expressions, spectral and prosodic features when a neural network is used as the fusion method. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves over 90% recall rate and over 80% precision.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. A. Bachorowski, M. J. Smoski, and M. J. Owren. The acoustic features of human laughter. Journal-Acoustical Society of America, 110(1):1581--1597, 2001.
 
2
P. Boersma and D. Weenink. Praat: doing phonetics by computer (version 4.3.01) (www.praat.org)). Technical report, 2005.
 
3
 
4
N. Campbell, H. Kashioka, and R. Ohara. No laughing matter. In European conference on speech communciation and technology:; Interspeech, pages 465--468, 2005.
 
5
 
6
S. Dupont and J. Luettin. Audio-visual speech modeling for continuous speech recognition. Ieee Transactions on Multimedia, 2(3):141--151, 2000.
7
 
8
D. Gonzalez-Jimenez and J. L. Alba-Castro. Toward pose-invariant 2-d face recognition through point distribution models and facial symmetry. IEEE Transactions on Information Forensics and Security, 2(3):413--429, 2007.
 
9
H. Hermansky. Perceptual linear predictive (plp) analysis of speech. Journal of the Acoustical Society of America, 87(4):1738--1752, 1990.
 
10
 
11
L. Kennedy and D. Ellis. Laughter detection in meetings. In NIST ICASSP 2004 Meeting Recognition Workshop, 2004.
12
 
13
I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, and V. Karaiskos. The ami meeting corpus. In International conference on methods and techniques in behavioral research; Proceedings of measuring behaviour 2005, pages 137--140, 2005.
14
 
15
I. Patras and M. Pantic. Particle filtering with factorized likelihoods for tracking facial features. In International conference on automatic face and gesture recognition, pages 97--104, 2004.
 
16
S. Petridis and M. Pantic. Audiovisual discrimination between laughter and speech. In International Conference on Acoustics Speech and Signal Processing, pages 5117--5120, 2008.
 
17
G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior. Recent advances in the automatic recognition of audiovisual speech. Proceedings of the Ieee, 91(9):1306--1326, 2003.
 
18
B. Reuderink. Fusion for audio-visual laughter detection. MS Thesis, University of Twente, the Netherlands, 2007.
 
19
J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. Facial and vocal expressions of emotion. Annual Review of Psychology, 54:329--349, 2003.
 
20
K. Scherer. Affect bursts. In S. van Goozen, N. van de Poll, and J. Sergeant, editors, Emotions: Essays on emotion theory, pages 161--193. 1994.
 
21
M. Schroder, D. Heylen, and I. Poggi. Perception of non-verbal emotional listener feedback. In R. Hoffmann and H. Mixdorff, editors, Speech prosody, pages 1--4, 2006.
 
22
23
 
24
Z. Zeng, M. Pantic, G. Roisman, and T. Huang. A survey of affect recognition methods: Audio, visual and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligence, accepted for publication, 30, 2008.


Collaborative Colleagues:
Stavros Petridis: colleagues
Maja Pantic: colleagues