14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999


Temporal Interpretation in ProSynth, a Prosodic Speech Synthesis System

Richard Ogden, John Local, Paul Carter

Department of Language and Linguistic Science, University of York, UK

ProSynth is an approach to speech synthesis which takes a rich linguistic structure as central to the generation of naturalsounding speech [1]. This paper outlines the model of temporal interpretation employed in ProSynth in generating polysyllabic utterances, and the phonological structures used to drive the synthesis. We start from the assumption that the speech signal is informationally rich, and that this acoustic richness reflects linguistic structural richness. The primary timing unit is the syllable, situated within a prosodic hierarchy. Two mechanisms are used for timing: (1) Syllables are joined by overlaying one over another; (2) Syllables are temporally compressed to produce the correct rhythmical effects.

Reference

  1. Hawkins, S, J House, M Huckvale, J Local, R Ogden (1998): ProSynth: an integrated prosodic approach to device-independent, natural-sounding speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing, Sydney, Australia. 1707-1710 (ISCA Archive, http://www.isca-speech.org/archive)

Full Paper

Bibliographic reference.  Ogden, Richard / Local, John / Carter, Paul (1999): "Temporal interpretation in ProSynth, a prosodic speech synthesis system", In ICPhS-14, 1059-1062.