15th International Congress of Phonetic Sciences (ICPhS-15)
Durations of real speech segments do not generally exhibit exponential distributions, as modelled implicitly by the state transitions of Markov processes. Several duration models were considered for integration within a segmental-HMM recognizer: uniform, exponential, Poisson, normal, gamma and discrete. The gamma distribution fitted that measured for silence best, by an order of magnitude. Evaluations determined an appropriate weighting for duration against the acoustic models. Tests showed a reduction of 2% absolute (6+% relative) in the phone-classification error rate with gamma and discrete models; exponential ones gave approximately 1% absolute reduction, and uniform no significant improvement. These gains in performance recommend the wider application of explicit duration models.
Bibliographic reference. Jackson, Philip J. B. (2003): "Improvements in phone-classification accuracy from modelling duration", In ICPhS-15, 1349-1352.