ACM Home Page
Please provide us with feedback. Feedback
Analysis of emotion recognition using facial expressions, speech and multimodal information
Full text PdfPdf (321 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
POSTER SESSION: Poster session 1 table of contents
Pages: 205 - 211  
Year of Publication: 2004
ISBN:1-58113-995-0
Authors
Carlos Busso  University of Southern California, Los Angeles
Zhigang Deng  University of Southern California, Los Angeles
Serdar Yildirim  University of Southern California, Los Angeles
Murtaza Bulut  University of Southern California, Los Angeles
Chul Min Lee  University of Southern California, Los Angeles
Abe Kazemzadeh  University of Southern California, Los Angeles
Sungbok Lee  University of Southern California, Los Angeles
Ulrich Neumann  University of Southern California, Los Angeles
Shrikanth Narayanan  University of Southern California, Los Angeles
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 48,   Downloads (12 Months): 284,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027968
What is a DOI?

ABSTRACT

The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Boersma, P., Weenink, D., Praat Speech Processing Software, Institute of Phonetics Sciences of the University of Amsterdam. http://www.praat.org
 
3
 
4
 
5
Chen, L.S., Huang, T.S. Emotional expressions in audiovisual human computer interaction. Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, Volume: 1, 30 July-2 Aug. 2000. Pages: 423 -- 426 vol.1
 
6
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G. Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE, Volume: 18, Issue: 1, Jan 2001. Pages: 32 -- 80
 
7
De Silva, L. C., Miyasato, T., and Nakatsu, R. Facial Emotion Recognition Using Multimodal Information. In Proc. IEEE Int. Conf. on Information, Communications and Signal Processing (ICICS'97), Singapore, pp. 397--401, Sept. 1997.
 
8
 
9
Dellaert, F., Polzin, T., Waibel, A. Recognizing emotion in speech. Spoken Language, 1996. ICSLP 96. Proceedings. Fourth International Conference on, Volume: 3, 3-6 Oct. 1996. Pages: 1970 -- 1973 vol.3.
 
10
Ekman, P., Friesen, W. V. Facial Action Coding System: A Technique for Measurement of Facial Movement. Consulting Psychologists Press Palo Alto, California, 1978.
 
11
 
12
Huang, T. S., Chen, L. S., Tao, H., Miyasato, T., Nakatsu, R. Bimodal Emotion Recognition by Man and Machine. Proceeding of ATR Workshop on Virtual Communication Environments, (Kyoto, Japan), April 1998.
 
13
Lee C. M., Narayanan, S.S., Pieraccini, R. Classifying emotions in human-machine spoken dialogs. Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International. Conference on , Volume: 1 , 26-29 Aug. 2002. Pages:737 -- 740 vol.1
 
14
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh A., Busso,C., Deng, Z., Lee, S., Narayanan, S.S. Emotion Recognition based on Phoneme Classes. to appear in Proc. ICSLP'04, 2004.
 
15
Lee C. M., Narayanan S.S. Towards detecting emotions in spoken dialogs. IEEE Trans. on Speech & Audio Processing, in press, 2004.
 
16
Mase K. Recognition of facial expression from optical flow. IEICE Transc., E. 74(10):3474--3483, 0ctober 1991.
 
17
Massaro, D. W. Illusions and Issues in Bimodal Speech Perception. Proceedings of Auditory Visual Speech Perception '98. (pp. 21-26). Terrigal-Sydney Australia, December, 1998.
 
18
Nwe, T. L., Wei, F. S., De Silva, L.C. Speech based emotion classification. Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International Conference on, Volume: 1 , 19-22 Aug. 2001. Pages: 297 -- 301 vol.1
 
19
Pantic, M., Rothkrantz, L.J.M. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE , Volume: 91 Issue: 9 , Sept. 2003. Page(s): 1370 --1390.
 
20
 
21
 
22
Yacoob, Y., Davis, L. Computing spatio-temporal representations of human faces. Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 June 1994 Page(s): 70 --75.
 
23
Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Deng, Z., Busso, C., Lee, S., Narayanan, S.S., Analysis of acoustic correlates in emotional speech. to appear in ICSLP'04, 2004.
 
24
Yoshitomi, Y., Sung-Ill Kim, Kawano, T., Kilazoe, T. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. Robot and Human Interactive Communication, 2000. RO-MAN 2000. Proceedings. 9th IEEE International Workshop on, 27-29 Sept. 2000. Pages: 178 -- 18.

CITED BY  11

Collaborative Colleagues:
Carlos Busso: colleagues
Zhigang Deng: colleagues
Serdar Yildirim: colleagues
Murtaza Bulut: colleagues
Chul Min Lee: colleagues
Abe Kazemzadeh: colleagues
Sungbok Lee: colleagues
Ulrich Neumann: colleagues
Shrikanth Narayanan: colleagues