ACM Home Page
Please provide us with feedback. Feedback
Emotional Chinese talking head system
Full text PdfPdf (328 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
POSTER SESSION: Poster session 2 table of contents
Pages: 273 - 280  
Year of Publication: 2004
ISBN:1-58113-995-0
Authors
Jianhua Tao  Chinese Academy of Science, Beijing, China
Tieniu Tan  Chinese Academy of Science, Beijing, China
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 71,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027978
What is a DOI?

ABSTRACT

Natural Human-Computer Interface requires integration of realistic audio and visual information for perception and display. In this paper, a lifelike talking head system is proposed. The system converts text to speech with synchronized animation of mouth movements and emotion expression. The talking head is based on a generic 3D human head model. The personalized model is incorporated into the system. With texture mapping, the personalized model offers a more natural and realistic look than the generic model. To express emotion, both emotional speech synthesis and emotional facial animation are integrated and Chinese viseme models are also created in the paper. Finally, the emotional talking head system is created to generate the natural and vivid audio-visual output.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
J. Cahn, "Generating Expression in Synthesized Speech," Master's thesis, MIT, 1989.
 
3
Essa, I.A. and Pentland, A. A vision system for observing and extracting facial action parameters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94).
 
4
Ekman, P. & Friesen, W.V. Facial action coding system. Palo Alto: Consulting Psychologist Press, 1978
 
5
Katz, G.S., Cohn, J.F., & Moore, C.A. A Combination of vocal f0 dynamic and summary features discriminates between three pragmatic categories of infant directed speech. Child Development,1996, 67, 205--217.
 
6
Murray, I.R. & Arnott, J.L. Toward the simulation of emotion in synthetic speech: A review of the literature on human emotion. Journal of the Acoustical Society of America, 1993(2), 1097--1108.
 
7
 
8
MPEG Video, Information technology - Coding of audio-visual objects - Part 5: Reference software, Amendment 1: Reference software extensions, ISO/IEC JTC 1/SC 29/ WG 11/N3309, March, 2000
 
9
Jianhua Tao, Emotion Control of Chinese Speech Synthesis in Natural Environment, Eurospeech2003, Genever, 2003,9
 
10
Schröder M, Emotional Speech Synthesis: A Review, Eurospeech2001
 
11
Hadap and Thalmann ,Fluid flow and vector field,EGCAS 2000
 
12
McGurk H, MacDonald J. Hearing lips and seeing voices. Nature, 1976, 264(5588): 746~748
 
13
International standard, Information technology-Coding of audio-visual objects-Part 2: Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
 
14
 
15
Chiou G I, Jenq-Neng Hwang. Lipreading from color video. IEEE Transactions on Image Processing, 1997, 6: 1192~1195
 
16
Lievin M, Delmas P, Coulon P Y, et al. Automatic lip tracking: Bayesian segmentation and active contours in a cooperative scheme. In: IEEE International Conference on Multimedia Computing and Systems, 1999. 1:691~696
 
17
Delmas P, Coulon PY, Fristot V. Automatic snakes for robust lip boundaries extraction. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999. 6: 3069~3072
 
18
Yang J, Xiao J, Ritter M. Automatic selection of visemes for image-based visual speech synthesis. In: IEEE International Conference on Multimedia and Expo, 2000, 2: 1081~1084
 
19
Bothe H H, Frauke R. Visual speech and coarticulation effects. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993. 5: 634~637
 
20
Breen A P, Bowers E, Welsh W. An investigation into the generation of mouth shapes for a talking head. In: Fourth International Conference on Spoken Language (ICSLP 96), 1996. 4: 2159~2162
 
21
Cohen M M, Massaro D W. Modeling coarticulation in synthetic visual speech. In: Models techniques in computer animation, Tokyo Springer-Verlag, 1993, 139~156
 
22
Morishima S, Aizawa K, Harashima H. An intelligent facial image coding driven by speech and phoneme. In: International Conference on Acoustics, Speech, and Signal Processing, 1989. 3: 1795~1798
23
 
24
Keikichi Hirose, Nobuaki Minematsu, etc, "Analytical and perceptual study on the role of acoustic features in realizing emotional speech", ICSLP2000
 
25
Schröder M1, Cowie R2, Douglas-Cowie E2, "Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis", Eurospeech2001
 
26
Kazuhito Koike, Hirotaka Suzuki, Hiroaki SAITO, "Prosodic Parameters in Emotional Speech", ICSLP98
 
27
J.M. Montero, J. Gutiérrez-Arriola, etc, "Emotional speech synthesis: from speech database to tts", ICSLP98
 
28
Murray, I. R.; Arnott, J.L.: 1992, 'Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion'. Journal of the acoustic society of America, 93, 1097--1108.
 
29
Williams, C.E.; Stevens, K.N.: 1981, 'Vocal correlates of emotional states'. In: J.K.Darby (eds.): Speech evaluation in psychiatry, New York, Grune & Stratton, pp. 221--240.
 
30
Ignasi Iriondo, etc, Validation of an acoustical modelling of emotional expression in Spanish using speech synthesis techniques, ISCA Workshop on Speech and Emotion, Belfast 2000
 
31
Akemi Iida, Nick Campbell,etc, A Speech Synthesis System with Emotion for Assisting Communication, ISCA Workshop on Speech and Emotion, Belfast 2000
 
32
Ze-Jing Chuang and Chung-Hsien Wu, "Emotion recognition from textual input using an emotional semantic network", ICSLP2002, Denver
33
 
34
Parke F. Computer graphic models for the human face. In: The IEEE Computer Society's Third International Computer Software and Applications Conference, 1979, 724~727
 
35
Magnenat-Thalmann N, Primeau E, Thalmann D. Abstract muscle action procedures for human face animation. Visual Computer, 1988, 3(5): 290--297
 
36
Ezzat T, Poggio T. Video Realistic Talking Faces: A Morphing Approach. In: Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece, 1997
 
37