15th International Congress of Phonetic Sciences (ICPhS-15)
A corpus-based generation of fundamental frequency (F0) contours was realized for emotional speech synthesis. The method, originally developed for read speech, is to predict command values of the F0 contour generation process model with the input of linguistic information of the sentence to be synthesized. Since the generated F0 contour is under the model constraint, a certain quality is still kept in synthesized speech even if the prediction is done poorly. The speech corpus used for the F0 contour generation experiments includes three types of emotional (anger, joy, sad) and calm speech uttered by a female narrator. The command values necessary for the training and evaluation of the method were automatically extracted using a program developed by the authors. We also applied the method to predict segmental durations. The mismatches between the predicted and target contours/durations for emotional speech were similar to those for calm speech.
Bibliographic reference. Hirose, Keikichi / Katsura, Toshiya / Minematsu, Nobuaki (2003): "Corpus-based synthesis of F0 contours for emotional speech using the generation process model", In ICPhS-15, 2945-2948.