14th International Congress of Phonetic Sciences (ICPhS-14)
San Francisco, CA, USA
Improving the naturalness of synthetic speech is an essential task in developing a text-to-speech (TTS) system. Mainly, it depends on the quality of the prosody model which is utilized in the TTS system. For our TTS system called DreSS (Dresden Speech Synthesizer), we compared three different methods for generating the F0 contour to each other as well as to other synthesizers. Natural speech samples were used as a reference. Results show, that on a naturalness scale from 0 to 4, the natural speech samples reach a maximum score of 3.6, with values of 1.9 for the best synthesis, the LPC-based one. The system with an intonation control basing on the Fujisaki model leads the group of PSOLA systems, which are closely clustered at a mean of 1.54.
Bibliographic reference. Hoffmann, Rüdiger / Hirschfeld, Diane / Jokisch, Oliver / Kordon, Ulrich / Mixdorff, Hansjörg / Mehnert, Dieter (1999): "Evaluation of a multilingual TTS system with respect to the prosodic quality", In ICPhS-14, 2307-2310.