14th International Congress of Phonetic Sciences (ICPhS-14)

San Francisco, CA, USA
August 1-7, 1999

A Modeling of the Objective Evaluation of Durational Rules Based on Auditory Perceptual Characteristics

Hiroaki Kato (1,3), Minoru Tsuzaki (1), Yoshinori Sagisaka (2,3)

(1) ATR Human Information Processing Research Laboratories, Kyoto, Japan
(2) ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan
(3) Kobe University, Kobe, Japan

Human subjective acceptability of temporal distortions in speech segments is significantly affected by several phonetic factors, e.g., the vowel color. The current study proposes amodeling of temporal error evaluation for synthetic rules that can predict, to some extent, acceptability to humans (a subjective measure) from only objective measures (physical properties) of speech signals based on auditory perceptual characteristics recently found by the authors. To accomplish this, the loudness contour is calculated as a main cue for temporal change of a speech signal. The results of an experiment to test the effectiveness of the model showed that the proposed model consistently achieved a better prediction (i.e., closer to human evaluation) than the reference model, which only used the average acoustic errors without any perceptual consideration.

