15th International Congress of Phonetic Sciences (ICPhS-15)

Barcelona, Spain
August 3-9, 2003

MASSY - A Prototypic Implementation of the Modular Audiovisual Speech SYnthesizer

Sascha Fagel

Technical University Berlin, Germany

Audiovisual speech synthesis systems usually are inflexible with respect to the ability to replace the audio and video synthesis and the control algorithms due to the dependencies of the implemented pieces. In order to enable a newly developed system to exchange modules, to evaluate their specific advantages, and to detect their weak points, the author proposes a framework for audiovisual speech synthesis systems which divides the system into several modules and describes their information flow (Fagel & Sendlmeier, 2002). This paper presents MASSY, the first prototypic implementation of the framework. Besides the embedded audio synthesis, the presented implementation includes a phonetic articulation module, a visual articulation module, and a face module. The visual articulation module implements two alternative models based on a dominance model for co-articulation in terms of Löfqvist's suggestion (1990) and a pattern selection algorithm, respectively. The realized face is a 3D model described in VRML 97 [16] with additionally implemented functionality according to the H-Anim 2001 standard. The facial animation is described in a motion parameter model which is capable to realize the most important visible articulation gestures (Cohen & Massaro, 1994; Benoît et al., 1995). #MASSY is developed in the client-server paradigm, where the server is easy to set up and does not need special or high performance hardware. The required bandwidth is low, and the client is an ordinary web browser with standard, non-proprietary plug-ins. The presented system is suitable for the evaluation of measured or predicted articulation models, as well as for the enhancement of human-computer-interfaces in applications like e.g. virtual tutors in e-learning environments, speech training, video conferencing, computer games, audiovisual information systems, or virtual agents.

Full Paper

Bibliographic reference.  Fagel, Sascha (2003): "MASSY - a prototypic implementation of the Modular Audiovisual Speech SYnthesizer", In ICPhS-15, 2553-2556.