14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
This paper describes and evaluates a method to estimate facial motion during speech from the speech acoustics. It is a statistical method based on simultaneous measurements of facial motion and speech acoustics. Experiments were carried out for one American English and one Japanese speaker. Facial motion is characterized by the 3D position of markers placed on the face and tracked at 60 frames/s. The speech acoustics is characterized by LSP parameters. The method is based on two points: (i) using appropriate constraints, the vocal-tract shape can be estimated from the speech acoustics; and (ii) most of facial motion is a con- sequence of vocal-tract motion. Marker positions and LSP parameters were collected during several utterances and used to train artificial neural networks, which were then evaluated with test data. In the results obtained, approxi- mately 85% of the facial motion variance were determined from the speech acoustics.
Bibliographic reference. Yehia, Hani / Kuratate, Takaaki / Vatikiotis-Bateson, Eric (1999): "Using speech acoustics to drive facial motion", In ICPhS-14, 631-634.