14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Multi-Level Labelling of Speech for Synthesis

Nick Campbell (1), Akemi Iida (2)

(1) ATR Interpreting Telecommunications Research Labs., Soraku-gun, Kyoto, Japan
(2) Keio University, Graduate School of Media and Governance, Fujisawa, Kanagawa, Japan

Phonetics and prosody have traditionally been considered as discrete disciplines, having little overlap, yet both are considered to be essential and complementary components for speech synthesis. This paper describes an approach to modelling the variation in speech, based on a gestalt view, wherein the prosodic and phonetic aspects of each speech segment are combined. Phonation style is proposed as the third dimension needed to characterise the speech sounds for synthesis by concatenation of raw waveform segments. We show that by labelling a large speech corpus with such high-level features, much of the redundancy of information in the speech signal can be preserved, and the resulting speech output maintains variation under natural constraints.

Full Paper

Bibliographic reference.  Campbell, Nick / Iida, Akemi (1999): "Multi-level labelling of speech for synthesis", In ICPhS-14, 499-502.