ACM Home Page
Please provide us with feedback. Feedback
HMM-based synthesis of emotional facial expressions during speech in synthetic talking heads
Full text PdfPdf (564 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 8th international conference on Multimodal interfaces table of contents
Banff, Alberta, Canada
SESSION: Oral session 6: interfaces and usability table of contents
Pages: 380 - 387  
Year of Publication: 2006
ISBN:1-59593-541-X
Authors
Nadia Mana  ITC-irst, Povo (Trento), Italy
Fabio Pianesi  ITC-irst, Povo (Trento), Italy
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 68,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1180995.1181065
What is a DOI?

ABSTRACT

One of the research goals in the human-computer interaction community is to build believable Embodied Conversational Agents, that is, agents able to communicate complex information with human-like expressiveness and naturalness. Since emotions play a crucial role in human communication and most of them are expressed through the face, having more believable ECAs implies to give them the ability of displaying emotional facial expressions.This paper presents a system based on Hidden Markov Models (HMMs) for the synthesis of emotional facial expressions during speech. The HMMs were trained on a set of emotion examples in which a professional actor uttered Italian non-sense words, acting various emotional facial expressions with different intensities.The evaluation of the experimental results, performed comparing the "synthetic examples" (generated by the system) with a reference "natural example" (one of the actor's examples) in three different ways, shows that HMMs for emotional facial expressions synthesis have some limitations but are suitable to make a synthetic Talking Head more expressive and realistic.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
K. Balci. Xface: Open Source Toolkit for Creating 3D Faces of an Embodied Conversational Agent. In Proceedings of Smart Graphics, 2005.
 
3
J. Beskow, L. Cerrato, P. Cosi, E. Costantini, M. Nordstrand, F. Pianesi, M. Prete, and G. Svanfeldt. Preliminary Cross-cultural Evaluation of Expressiveness in Synthetic Faces. In E. Andrè, L. Dybkiaer, W. Minker, and P. Heisterkamp, editors, Affective Dialogue Systems ADS '04, Springer Verlag, 2004.
 
4
 
5
 
6
7
 
8
I. Cohen, A. Garg, and T. Huang. Emotion recognition from facial expressions using multilevel HMM, 2000.
 
9
E. Costantini, F. Pianesi, and P. Cosi. Evaluation of Synthetic Faces: Human Recognition of Emotional Facial Displays. In E. Andrè, L. Dybkiaer, W. Minker, and P. Heisterkamp, editors, Affective Dialogue Systems ADS '04, Springer-Verlag, 2004.
 
10
 
11
P. Doenges, F. Lavagetto, J. Ostermann, I. S. Pandzic, and E. Petajan. MPEG-4: Audio/Video and Synthetic Graphics/Audio for Mixed Media. In Image Communications Journal, 5(4), May 1997.
 
12
P. Ekman. An Argument for Basic Emotions. In N. L. Stein, and K. Oatley, editors, Basic Emotions, pp 169--200, 1992.
 
13
P. Ekman, and W. Friesen. Manual for the Facial Action Coding System. Consulting Psychologists Press, 1978.
 
14
G. Ferrigno, and A. Pedotti. ELITE: A Digital Dedicated Hardware System for Movement Analysis via Real-Time TV Signal Processing. In IEEE Transactions on Biomedical Engineering, BME-32, pp 943--950, 1985.
 
15
 
16
E. Magno Caldognetto, C. Zmarich, P. Cosi and F. Ferrero. Italian Consonantal Visemes: Relationships Between Spatial/temporal Articulatory Characteristics and Coproduced Acoustic Signal. In Proceedings of AVSP-97, Tutorial & Research Workshop on Audio-Visual Speech Processing: Computational & Cognitive Science Approaches, Rhodes (Greece), pp. 5--8, 26-27 September 1997.
 
17
N. Mana, P. Cosi, G. Tisato, F. Cavicchio, E. Magno and F. Pianesi. An Italian Database of Emotional Speech and Facial Expressions. Proceedings of "Workshop on Emotion: Corpora for Research on Emotion and Affect", in association with 5th International Conference on Language, Resources and Evaluation (LREC2006), Genoa, Italy, May 2006.
 
18
McBreen, H., Jack, M. (2001). Evaluating Humanoid Synthetic Agents in e-retail Applications. IEEE Transactions on Systems, Man and Cybernetics, vol. 31 (5), 2001.
19
 
20
 
21
L. R. Rabiner. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE, 77(2), pp. 257--286, 1989.
22
 
23
D. Sankoff and J. B. Kruskal. Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley Publishing Company, Reading, MA, 1983.

Collaborative Colleagues:
Nadia Mana: colleagues
Fabio Pianesi: colleagues