15th International Congress of Phonetic Sciences (ICPhS-15)

Barcelona, Spain
August 3-9, 2003


Bayesian Modelling of Vowel Segment Duration for Text-to-Speech Synthesis Using Distinctive Features

Olga V. Goubanova

University of Edinburgh, UK

We apply a Bayesian belief network (BN) approach to vowel duration modelling, whereby vowel segment duration is modelled as a hybrid Bayesian network consisting of discrete and continuous nodes, with the nodes in the network representing linguistic factors that affect segment duration. Factor interaction is modelled in a concise way by causal relationships among the nodes in a directed acyclic (DAG) graph. New to the present research, we model segment identity as a set of distinctive features. The features chosen were frontness, height, length, and roundness. In addition, the BNs were augmented with the word class feature (content vs. function). We experimented with different BNs, and contrasted the results of the belief network model with those of Sums-of-Products (SoP) and classification and regression trees (CART) models. We trained and tested all three models on the same data. In terms of the RMS error and correlation coefficient, our BN model performs better than CART and SoP model.

Full Paper

Bibliographic reference.  Goubanova, Olga V. (2003): "Bayesian modelling of vowel segment duration for text-to-speech synthesis using distinctive features", In ICPhS-15, 2349-2352.