15th International Congress of Phonetic Sciences (ICPhS-15)
Our work introduces a trainable speech enhancement technique that can explicitly incorporate information about the long-term, time-frequency characteristics of speech signals prior to the enhancement process. We approximate noise spectral magnitude from available recordings from the operational environment as well as clean speech from clean database with mixtures of Gaussian pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE) we derive a closed form solution for the spectral magnitude estimation task. We evaluate our technique with a focus on real, highly non-stationary noise types (passing-by aircraft noise) and demonstrate its efficiency at low SNRs.
Bibliographic reference. Potamitis, Ilyas / Fakotakis, Nikos / Kokkinakis, George (2003): "Model based speech enhancement for time-varying noises", In ICPhS-15, 2197-2200.