15th International Congress of Phonetic Sciences (ICPhS-15)

Barcelona, Spain
August 3-9, 2003

Improved ASR in Noise Using Harmonic Decomposition

David M. Moreno (1), Philip J. B. Jackson (2), Javier Hernando (1), Martin J. Russell (3)

(1) Universitat Politecnica de Catalunya, Spain
(2) University of Surrey, UK
(3) University of Birmingham, UK

Application of the pitch-scaled harmonic filter (PSHF) to automatic speech recognition in noise was investigated using the Aurora 2.0 database. The PSHF decomposed the original speech into periodic and aperiodic streams. Digit-recognition tests with the extended features compared the noise robustness of various parameterisations against standard 39 MFCCs. Separately, each stream reduced word accuracy by less than 1% absolute; together, the combined streams gave substantial increases under noisy conditions. Applying PCA to concatenated features proved better than to separate streams, and to static coefficients better than after calculation of deltas. With multi-condition training, accuracy improved by 7.8% at 5 dB SNR, thus providing resilience from corruption by noise.

Bibliographic reference.  Moreno, David M. / Jackson, Philip J. B. / Hernando, Javier / Russell, Martin J. (2003): "Improved ASR in noise using harmonic decomposition", In ICPhS-15, 751-754.