|
ABSTRACT
Perhaps the most fundamental application of affective computing will be Human-Computer Interaction (HCI) in which the computer should have the ability to detect and track the user's affective states, and make corresponding feedback. The human multi-sensor affect system defines the expectation of multimodal affect analyzer. In this paper, we present our efforts toward audio-visual HCI-related affect recognition. With HCI applications in mind, we take into account some special affective states which indicate users' cognitive/motivational states. Facing the fact that a facial expression is influenced by both an affective state and speech content, we apply a smoothing method to extract the information of the affective state from facial features. In our fusion stage, a voting method is applied to combine audio and visual modalities so that the final affect recognition accuracy is greatly improved. We test our bimodal affect recognition approach on 38 subjects with 11 HCI-related affect states. The extensive experimental results show that the average person-dependent affect recognition accuracy is almost 90% for our bimodal fusion.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Pantic M., Rothkrantz, L.J.M., Toward an affect-sensitive multimodal human-computer interaction, Proceedings of the IEEE, Vol. 91, No. 9, Sept. 2003, 1370--1390
|
| |
2
|
Chen, L. and Huang, T. S., Emotional expressions in audiovisual human computer interaction, Int. Conf. on Multimedia & Expo 2000, 423--426
|
| |
3
|
|
| |
4
|
|
| |
5
|
Yoshitomi, Y., Kim, S., Kawano, T., and Kitazoe, T., Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face, in Proc. ROMAN 2000, 178--183
|
| |
6
|
|
| |
7
|
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J.G., Emotion Recognition in Human-Computer Interaction, IEEE Signal Processing Magazine, January 2001, 32--80
|
| |
8
|
Steeneken, H.J.M. and Hansen, J.H.L., Speech Under Stress Conditions: Overview of the Effect on Speech Production and on System Performance, in Proc. ICASSP, vol. 4, 1999, 2079--2082
|
| |
9
|
Carlson, A.J., Cumby, C.M., Rizzolo, N.D., Rosen, J.L., and Roth, D., SNoW User Manual, UIUC Tech report UIUC-DCS-R-99-210
|
| |
10
|
Tu, J., Zhang, Z., Zeng, Z. and Huang, T.S., Face Localization via Hierarchical Condensation with Fisher Boosting Feature Selection, In Proc. Computer Vision and Pattern Recognition, 2004.
|
| |
11
|
|
| |
12
|
Mehrabian, A., Communication without words, Psychol. Today, vol.2, no.4, 53--56, 1968
|
| |
13
|
Sebe, N., Lew, M., Cohen, I., Sun, Y., Gevers, T., and Huang, T.S., Authentic Facial Expression Analysis, Int. Conf. on Automatic Face & Gesture Recognition 2004.
|
| |
14
|
itr.beckman.uiuc.edu
|
| |
15
|
|
| |
16
|
|
CITED BY 3
|
|
Zhihong Zeng , Yuxiao Hu , Ming Liu , Yun Fu , Thomas S. Huang, Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Interaction styles (e.g., commands, menus, forms, direct manipulation)
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
User-centered design
K.
Computing Milieux
K.3
COMPUTERS AND EDUCATION
K.3.1
Computer Uses in Education
Subjects:
Collaborative learning
General Terms:
Design,
Performance
Keywords:
affect recognition,
affective computing,
emotion recognition,
multimodal human-computer interaction
|