However, the learning-based algorithms are generally resource intensive and require a long time context, which makes them hard to implement in real time processing. In [16], Nakatani et al. proposed harmonicity-based dereverberation (HERB) methods, which modeled RIR inverse filters as a ratio of the direct path component to the received signal. The design of the inverse filters exploited the harmonicity characteristics of the speech signal and estimated the filter coefficients in two distinct methods - one method estimated the average filter that transformed reverberant signals into harmonic signals, while the other method used a minimum mean squared error criterion that evaluated the quasi-periodicity of target signals. HERB algorithms take relatively longer time to converge, which also makes them difficult to use in real time processing. Linear predictive multi-input equalization (LIME) algorithm was used in [17] to achieve muti-channel dereverberation. The whitened speech residuals from the LIME output was mixed with the estimation of source auto regressive polynomials to obtain clean …show more content…
A long-term multi-step linear predictionbased late reverberation signal estimation was used in SS by Kinoshita et al. in [1]. Wisdom et al. proposed speech coherence-based minimum mean square error (MMSE) log spectral amplitude estimator in [25]. Another variation of SS-based method was proposed by Cauchi et al. who incorporated temporal cepstrum smoothing [26]. Wu et al. estimated the late reverberation power spectrum using an asymmetrical smoothing window based on Rayleigh distribution [27]. Veras et al. extended Wu’s method in their formulation of speech derverberation in [28]. Kokkinakis et al. used variable subtraction factor as a function of the a posteriori signal to noise ratio (SNR) and evaluated the performance in cochlear implant devices [29]. However, most of the spectral enhancement techniques assume that the speech signal is orthogonal to the undesired signal, be it a random background noise or reverberation, and ignore any cross-term between the signal components. However, Yang et al. argued that the cross-term was not necessarily zero in all the scenarios and depended on the a priori SNR in practical cases involving white background noise