14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

Transformation Parameters vs. Contextual Factors: A New Perspective on Statistical Duration Modeling

Jerome R. Bellegarda, Kim E. A. Silverman

Spoken Language Group, Apple Computer, Inc., Cupertino, CA, USA

The “sums-of-products” approach has been found useful to model contextual influences on phoneme duration. This approach involves multiple linear regression, which is generally applied after log-transforming the durations. This paper presents empirical and theoretical evidence which suggests that this transformation is not optimal. An alternative solution is proposed, based on a piecewise linear framework. Preliminary experimental results were obtained on over 50,000 phonemes in varied prosodic contexts. They show that this transformation reduces the unexplained deviations in the data by approximately 15%) as measured in the original domain. Alternatively, at a given operating point, it also reduces the number of parameters required, by about 25% at usual levels of complexity.

Full Paper

