ACM Home Page
Please provide us with feedback. Feedback
Transferable videorealistic speech animation
Full text PdfPdf (557 KB)
Source Symposium on Computer Animation archive
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation table of contents
Los Angeles, California
SESSION: Faces and hair table of contents
Pages: 143 - 151  
Year of Publication: 2005
ISBN:1-7695-2270-X
Authors
Yao-Jen Chang  Computer and Communications Laboratories, ITRI, Taiwan
Tony Ezzat  Center for Biological and Computational Learning, MIT
Sponsors
Eurographics: Eurographics Association
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 40,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1073368.1073388
What is a DOI?

ABSTRACT

Image-based videorealistic speech animation achieves significant visual realism at the cost of the collection of a large 5- to 10-minute video corpus from the specific person to be animated. This requirement hinders its use in broad applications, since a large video corpus for a specific person under a controlled recording setup may not be easily obtained In this paper, we propose a model transfer and adaptation algorithm which allows for a novel person to be animated using only a small video corpus. The algorithm starts with a multidimensional morphable model (MMM) previously trained from a different speaker with a large corpus, and transfers it to the novel speaker with a much smaller corpus. The algorithm consists of 1) a novel matching-by-synthesis algorithm which semi-automatically selects new MMM prototype images from the new video corpus and 2) a novel gradient descent linear regression algorithm which adapts the MMM phoneme models to the data in the novel video corpus. Encouraging experimental results are presented in which a morphable model trained from a performer with a 10-minute corpus is transferred to a novel person using a 15-second movie clip of him as the adaptation video corpus.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
{BBPV03} Blanz V., Basso C., Poggio T., Vetter T.: Reanimating faces in images and video. In Proc. Eurographics '03 (2003), vol. 22.
 
2
 
3
 
4
 
5
{CC02} Chang Y. J., Chen Y. C.: Facial model adaptation from a monocular image sequence using a textured polygonal model. Signal Processing: Image Communication 17, 5 (May 2002), 373--392.
 
6
 
7
{CG00} Cosatto E., Graf H. P.: Photo-realistic talking-heads from image samples. IEEE Trans. on Multimedia 2, 3 (Sept. 2000), 152--163.
8
 
9
{Gal98} Gales M. J. F.: Cluster adaptive training for speech recognition. In Proc. the 5th International Conference on Spoken Language Processing (1998), pp. 1783--1786.
 
10
 
11
{GL94} Gauvain J. L., Lee C. H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. on Speech and Audio Processing 2, 2 (Apr. 1994), 291--298.
12
 
13
 
14
 
15
{KNJ*98} Kuhn R., Nguyen P., Junqua J. C., Goldwasser L., Niedzielski N., Fincke S., Field K., Contolini M.: Eigenvoices for speaker adaptation. In Proc. the 5th International Conference on Spoken Language Processing (1998), pp. 1771--1774.
16
 
17
{LW95} Leggetter C. J., Woodland P. C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language 9, 2 (1995), 171--185.
 
18
{NJ04} Na K., Jung M.: Hierarchical retargetting of fine facial motions. In Proc. Eurographics '04 (2004).
19
20
 
21
 
22
{WHL*04} Wang Y., Huang X., Lee C. S., Zhang S., Li Z., Samaras D., Metaxas D., Elgammal A., Huang P.: High resolution acquisition, learning and transfer of dynamic 3-d facial expressions. In Proc. Eurographics '04 (2004).
 
23


Collaborative Colleagues:
Yao-Jen Chang: colleagues
Tony Ezzat: colleagues