15th International Congress of Phonetic Sciences (ICPhS-15)

Barcelona, Spain
August 3-9, 2003

A Comparison of Several Approaches to the Feature Extractor Design for ASR Tasks in Telephone Environment

A. Gallardo-Antolin (1), J. Macias-Guarasa (2), J. Ferreiros (2), R. Cordoba (2), J. M. Montero-Martinez (2), R. San-Segundo (2), J. M. Pardo (2)

(1) Universidad Carlos III de Madrid, Spain
(2) Universidad Politécnica de Madrid, Spain

Automatic speech recognition (ASR) systems are usually composed of a parameterization module and a back-end classifier. The performance of the overall system strongly depends on the choice of the feature extraction module. In this paper we investigate two different approaches for designing this module. In the first one (the conventional approach), its main characteristics are chosen based on psychoacoustic knowledge. In the second one, a data-driven technique ("Discriminative Feature Extraction"-DFE-), which performs a simultaneous optimization of the feature extractor and the back-classifier, is used. Both strategies have been applied to a front-end based on the Wavelet Transform (WT). Results show that DFE systematically improves the performance. In fact, applying the DFE strategy to the WT-based acoustic features, a relative error reduction around 23% (compared to the conventional features based on Short-Time Fourier Transform) is achieved when using the SpeechDat database with a vocabulary of 1000 words.

Full Paper

Bibliographic reference.  Gallardo-Antolin, A. / Macias-Guarasa, J. / Ferreiros, J. / Cordoba, R. / Montero-Martinez, J. M. / San-Segundo, R. / Pardo, J. M. (2003): "A comparison of several approaches to the feature extractor design for ASR tasks in telephone environment", In ICPhS-15, 1345-1348.