|
ABSTRACT
Natural Human-Computer Interface requires integration of realistic audio and visual information for perception and display. In this paper, a lifelike talking head system is proposed. The system converts text to speech with synchronized animation of mouth movements and emotion expression. The talking head is based on a generic 3D human head model. The personalized model is incorporated into the system. With texture mapping, the personalized model offers a more natural and realistic look than the generic model. To express emotion, both emotional speech synthesis and emotional facial animation are integrated and Chinese viseme models are also created in the paper. Finally, the emotional talking head system is created to generate the natural and vivid audio-visual output.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
J. Cahn, "Generating Expression in Synthesized Speech," Master's thesis, MIT, 1989.
|
| |
3
|
Essa, I.A. and Pentland, A. A vision system for observing and extracting facial action parameters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94).
|
| |
4
|
Ekman, P. & Friesen, W.V. Facial action coding system. Palo Alto: Consulting Psychologist Press, 1978
|
| |
5
|
Katz, G.S., Cohn, J.F., & Moore, C.A. A Combination of vocal f0 dynamic and summary features discriminates between three pragmatic categories of infant directed speech. Child Development,1996, 67, 205--217.
|
| |
6
|
Murray, I.R. & Arnott, J.L. Toward the simulation of emotion in synthetic speech: A review of the literature on human emotion. Journal of the Acoustical Society of America, 1993(2), 1097--1108.
|
| |
7
|
|
| |
8
|
MPEG Video, Information technology - Coding of audio-visual objects - Part 5: Reference software, Amendment 1: Reference software extensions, ISO/IEC JTC 1/SC 29/ WG 11/N3309, March, 2000
|
| |
9
|
Jianhua Tao, Emotion Control of Chinese Speech Synthesis in Natural Environment, Eurospeech2003, Genever, 2003,9
|
| |
10
|
Schröder M, Emotional Speech Synthesis: A Review, Eurospeech2001
|
| |
11
|
Hadap and Thalmann ,Fluid flow and vector field,EGCAS 2000
|
| |
12
|
McGurk H, MacDonald J. Hearing lips and seeing voices. Nature, 1976, 264(5588): 746~748
|
| |
13
|
International standard, Information technology-Coding of audio-visual objects-Part 2: Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
|
| |
14
|
|
| |
15
|
Chiou G I, Jenq-Neng Hwang. Lipreading from color video. IEEE Transactions on Image Processing, 1997, 6: 1192~1195
|
| |
16
|
Lievin M, Delmas P, Coulon P Y, et al. Automatic lip tracking: Bayesian segmentation and active contours in a cooperative scheme. In: IEEE International Conference on Multimedia Computing and Systems, 1999. 1:691~696
|
| |
17
|
Delmas P, Coulon PY, Fristot V. Automatic snakes for robust lip boundaries extraction. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999. 6: 3069~3072
|
| |
18
|
Yang J, Xiao J, Ritter M. Automatic selection of visemes for image-based visual speech synthesis. In: IEEE International Conference on Multimedia and Expo, 2000, 2: 1081~1084
|
| |
19
|
Bothe H H, Frauke R. Visual speech and coarticulation effects. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993. 5: 634~637
|
| |
20
|
Breen A P, Bowers E, Welsh W. An investigation into the generation of mouth shapes for a talking head. In: Fourth International Conference on Spoken Language (ICSLP 96), 1996. 4: 2159~2162
|
| |
21
|
Cohen M M, Massaro D W. Modeling coarticulation in synthetic visual speech. In: Models techniques in computer animation, Tokyo Springer-Verlag, 1993, 139~156
|
| |
22
|
Morishima S, Aizawa K, Harashima H. An intelligent facial image coding driven by speech and phoneme. In: International Conference on Acoustics, Speech, and Signal Processing, 1989. 3: 1795~1798
|
 |
23
|
|
| |
24
|
Keikichi Hirose, Nobuaki Minematsu, etc, "Analytical and perceptual study on the role of acoustic features in realizing emotional speech", ICSLP2000
|
| |
25
|
Schröder M1, Cowie R2, Douglas-Cowie E2, "Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis", Eurospeech2001
|
| |
26
|
Kazuhito Koike, Hirotaka Suzuki, Hiroaki SAITO, "Prosodic Parameters in Emotional Speech", ICSLP98
|
| |
27
|
J.M. Montero, J. Gutiérrez-Arriola, etc, "Emotional speech synthesis: from speech database to tts", ICSLP98
|
| |
28
|
Murray, I. R.; Arnott, J.L.: 1992, 'Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion'. Journal of the acoustic society of America, 93, 1097--1108.
|
| |
29
|
Williams, C.E.; Stevens, K.N.: 1981, 'Vocal correlates of emotional states'. In: J.K.Darby (eds.): Speech evaluation in psychiatry, New York, Grune & Stratton, pp. 221--240.
|
| |
30
|
Ignasi Iriondo, etc, Validation of an acoustical modelling of emotional expression in Spanish using speech synthesis techniques, ISCA Workshop on Speech and Emotion, Belfast 2000
|
| |
31
|
Akemi Iida, Nick Campbell,etc, A Speech Synthesis System with Emotion for Assisting Communication, ISCA Workshop on Speech and Emotion, Belfast 2000
|
| |
32
|
Ze-Jing Chuang and Chung-Hsien Wu, "Emotion recognition from textual input using an emotional semantic network", ICSLP2002, Denver
|
 |
33
|
|
| |
34
|
Parke F. Computer graphic models for the human face. In: The IEEE Computer Society's Third International Computer Software and Applications Conference, 1979, 724~727
|
| |
35
|
Magnenat-Thalmann N, Primeau E, Thalmann D. Abstract muscle action procedures for human face animation. Visual Computer, 1988, 3(5): 290--297
|
| |
36
|
Ezzat T, Poggio T. Video Realistic Talking Faces: A Morphing Approach. In: Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece, 1997
|
| |
37
|
|
|