15th International Congress of Phonetic Sciences (ICPhS-15)Barcelona, Spain |
Automatic speech recognition (ASR) systems are usually composed of a parameterization module and a back-end classifier. The performance of the overall system strongly depends on the choice of the feature extraction module. In this paper we investigate two different approaches for designing this module. In the first one (the conventional approach), its main characteristics are chosen based on psychoacoustic knowledge. In the second one, a data-driven technique ("Discriminative Feature Extraction"-DFE-), which performs a simultaneous optimization of the feature extractor and the back-classifier, is used. Both strategies have been applied to a front-end based on the Wavelet Transform (WT). Results show that DFE systematically improves the performance. In fact, applying the DFE strategy to the WT-based acoustic features, a relative error reduction around 23% (compared to the conventional features based on Short-Time Fourier Transform) is achieved when using the SpeechDat database with a vocabulary of 1000 words.
Bibliographic reference. Gallardo-Antolin, A. / Macias-Guarasa, J. / Ferreiros, J. / Cordoba, R. / Montero-Martinez, J. M. / San-Segundo, R. / Pardo, J. M. (2003): "A comparison of several approaches to the feature extractor design for ASR tasks in telephone environment", In ICPhS-15, 1345-1348.