Dual-channel eKF-RTF framework for speech enhancement with DNN-based speech presence estimation

Authors: Juan Manuel Martín Doñas Antonio Miguel Peinado Herreros Iván López Espejo Ángel Manuel Gómez García

Date: 24.03.2021


Abstract

This paper presents a dual-channel speech enhancement framework that effectively integrates deep neural network (DNN) mask estimators. Our framework follows a beamforming-plus-postfiltering approach intended for noise reduction on dual-microphone smartphones. An extended Kalman filter is used for the estimation of the relative acoustic channel between microphones, while the noise estimation is performed using a speech presence probability estimator. We propose the use of a DNN estimator to improve the prediction of the speech presence probabilities without making any assumption about the statistics of the signals. We evaluate and compare different dual-channel features to improve the accuracy of this estimator, including the power and phase difference between the speech signals at the two microphones. The proposed integrated scheme is evaluated in different reverberant and noisy environments when the smartphone is used in both close- and far-talk positions. The experimental results show that our approach achieves significant improvements in terms of speech quality, intelligibility, and distortion when compared to other approaches based only on statistical signal processing.

BIB_text

@Article {
title = {Dual-channel eKF-RTF framework for speech enhancement with DNN-based speech presence estimation},
pages = {31-35},
keywds = {
Dual-microphone smartphone, beamforming, extended Kalman filter, speech presence probability, deep neural network
}
abstract = {

This paper presents a dual-channel speech enhancement framework that effectively integrates deep neural network (DNN) mask estimators. Our framework follows a beamforming-plus-postfiltering approach intended for noise reduction on dual-microphone smartphones. An extended Kalman filter is used for the estimation of the relative acoustic channel between microphones, while the noise estimation is performed using a speech presence probability estimator. We propose the use of a DNN estimator to improve the prediction of the speech presence probabilities without making any assumption about the statistics of the signals. We evaluate and compare different dual-channel features to improve the accuracy of this estimator, including the power and phase difference between the speech signals at the two microphones. The proposed integrated scheme is evaluated in different reverberant and noisy environments when the smartphone is used in both close- and far-talk positions. The experimental results show that our approach achieves significant improvements in terms of speech quality, intelligibility, and distortion when compared to other approaches based only on statistical signal processing.


}
date = {2021-03-24},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Edificio Ensanche,
Zabalgune Plaza 11,
48009 Bilbao (Spain)

close overlay