|
ABSTRACT
The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Boersma, P., Weenink, D., Praat Speech Processing Software, Institute of Phonetics Sciences of the University of Amsterdam. http://www.praat.org
|
| |
3
|
|
| |
4
|
|
| |
5
|
Chen, L.S., Huang, T.S. Emotional expressions in audiovisual human computer interaction. Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, Volume: 1, 30 July-2 Aug. 2000. Pages: 423 -- 426 vol.1
|
| |
6
|
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G. Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE, Volume: 18, Issue: 1, Jan 2001. Pages: 32 -- 80
|
| |
7
|
De Silva, L. C., Miyasato, T., and Nakatsu, R. Facial Emotion Recognition Using Multimodal Information. In Proc. IEEE Int. Conf. on Information, Communications and Signal Processing (ICICS'97), Singapore, pp. 397--401, Sept. 1997.
|
| |
8
|
|
| |
9
|
Dellaert, F., Polzin, T., Waibel, A. Recognizing emotion in speech. Spoken Language, 1996. ICSLP 96. Proceedings. Fourth International Conference on, Volume: 3, 3-6 Oct. 1996. Pages: 1970 -- 1973 vol.3.
|
| |
10
|
Ekman, P., Friesen, W. V. Facial Action Coding System: A Technique for Measurement of Facial Movement. Consulting Psychologists Press Palo Alto, California, 1978.
|
| |
11
|
|
| |
12
|
Huang, T. S., Chen, L. S., Tao, H., Miyasato, T., Nakatsu, R. Bimodal Emotion Recognition by Man and Machine. Proceeding of ATR Workshop on Virtual Communication Environments, (Kyoto, Japan), April 1998.
|
| |
13
|
Lee C. M., Narayanan, S.S., Pieraccini, R. Classifying emotions in human-machine spoken dialogs. Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International. Conference on , Volume: 1 , 26-29 Aug. 2002. Pages:737 -- 740 vol.1
|
| |
14
|
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh A., Busso,C., Deng, Z., Lee, S., Narayanan, S.S. Emotion Recognition based on Phoneme Classes. to appear in Proc. ICSLP'04, 2004.
|
| |
15
|
Lee C. M., Narayanan S.S. Towards detecting emotions in spoken dialogs. IEEE Trans. on Speech & Audio Processing, in press, 2004.
|
| |
16
|
Mase K. Recognition of facial expression from optical flow. IEICE Transc., E. 74(10):3474--3483, 0ctober 1991.
|
| |
17
|
Massaro, D. W. Illusions and Issues in Bimodal Speech Perception. Proceedings of Auditory Visual Speech Perception '98. (pp. 21-26). Terrigal-Sydney Australia, December, 1998.
|
| |
18
|
Nwe, T. L., Wei, F. S., De Silva, L.C. Speech based emotion classification. Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International Conference on, Volume: 1 , 19-22 Aug. 2001. Pages: 297 -- 301 vol.1
|
| |
19
|
Pantic, M., Rothkrantz, L.J.M. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE , Volume: 91 Issue: 9 , Sept. 2003. Page(s): 1370 --1390.
|
| |
20
|
|
| |
21
|
|
| |
22
|
Yacoob, Y., Davis, L. Computing spatio-temporal representations of human faces. Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 June 1994 Page(s): 70 --75.
|
| |
23
|
Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Deng, Z., Busso, C., Lee, S., Narayanan, S.S., Analysis of acoustic correlates in emotional speech. to appear in ICSLP'04, 2004.
|
| |
24
|
Yoshitomi, Y., Sung-Ill Kim, Kawano, T., Kilazoe, T. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. Robot and Human Interactive Communication, 2000. RO-MAN 2000. Proceedings. 9th IEEE International Workshop on, 27-29 Sept. 2000. Pages: 178 -- 18.
|
CITED BY 11
|
|
Zhihong Zeng , Maja Pantic , Glenn I. Roisman , Thomas S. Huang, A survey of affect recognition methods: audio, visual and spontaneous expressions, Proceedings of the 9th international conference on Multimodal interfaces, November 12-15, 2007, Nagoya, Aichi, Japan
|
|
|
|
|
|
Zhihong Zeng , Yuxiao Hu , Yun Fu , Thomas S. Huang , Glenn I. Roisman , Zhen Wen, Audio-visual emotion recognition in adult attachment interview, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
|
|
|
Vered Aharonson , Nadav Nehmadi , Hagit Messer, Automatic emotional stimulus identification from facial expressions, Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications, p.333-337, February 14-16, 2007, Innsbruck, Austria
|
|
|
|
|
|
|
|
|
Cheonshu Park , Joungwoo Ryu , Sangseung Kang , Jaehong Kim , Joochan Sohn , Hyunkyu Cho, The emotion expression robot through the affective interaction: KOBIE, Proceedings of the 1st international conference on Robot communication and coordination, October 15-17, 2007, Athens, Greece
|
|
|
|
|
|
Elizabeth S. Kim , Dan Leyzberg , Katherine M. Tsui , Brian Scassellati, How people talk when teaching a robot, Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, March 09-13, 2009, La Jolla, California, USA
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Interaction styles (e.g., commands, menus, forms, direct manipulation)
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Auditory (non-speech) feedback
General Terms:
Design,
Experimentation,
Human Factors,
Performance
Keywords:
PCA,
SVC,
affective states,
decision level fusion,
emotion recognition,
feature level fusion,
human-computer interaction (HCI),
speech,
vision
|