Download presentation
Presentation is loading. Please wait.
Published byFelicity Kennedy Modified over 9 years ago
1
Dept. of Electrical Engineering, KU Leuven, Belgium
Multi-microphone noise reduction and dereverberation techniques for speech applications Simon Doclo Dept. of Electrical Engineering, KU Leuven, Belgium 8 July 2003
2
Overview Introduction Basic principles Robust broadband beamforming
Multi-microphone optimal filtering Acoustic transfer function estimation and dereverberation Conclusion and further research
3
Overview Introduction Basic principles Robust broadband beamforming
Motivation and applications Problem statement Contributions Basic principles Robust broadband beamforming Multi-microphone optimal filtering Acoustic transfer function estimation and dereverberation Conclusion and further research
4
Motivation Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Speech communication applications: hands-free mobile telephony, voice-controlled systems, hearing aids Speech acquisition in an adverse acoustic environment Poor signal quality Speech intelligibility and speech recognition Reverberation - reflections of signal against walls, objects Background noise: - fan, radio - other speakers - generally unknown
5
Objectives Signal enhancement techniques:
Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Signal enhancement techniques: Noise reduction : reduce amount of background noise without distorting speech signal Dereverberation : reduce effect of signal reflections Combined noise reduction and dereverberation Acoustic source localisation: video camera or spotlight Signal enhancement
6
Applications Hands-free mobile telephony: Video-conferencing:
Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Hands-free mobile telephony: Most important application from economic point of view Hands-free car kit mandatory in many countries Most current systems: 1 directional microphone Video-conferencing: Microphone array for source localisation : point camera towards active speaker signal enhancement by steering of microphone array
7
Applications Voice-controlled systems:
Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Voice-controlled systems: domotic systems, consumer electronics (HiFi, PC software) added value only when speech recognition system performs reliably under all circumstances signal enhancement as pre-processing step Hearing aids and cochlear implants: most hearing impaired suffer from perceptual hearing loss amplification reduction of noise wrt useful speech signal multiple microphones + DSP in hearing aid current systems: simple beamforming robustness important due to small inter-microphone distance
8
Algorithmic requirements
Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion ‘Blind’ techniques: unknown noise sources and acoustic environment Adaptive: time-variant signals and acoustic environment Robustness: Microphone characteristics (gain, phase, position) Other deviations from assumed signal model (look direction error, VAD) Integration of different enhancement techniques Computational complexity
9
Problem statement Problem of existing techniques:
Introduction -Motivation -Problem statement -Contributions Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Problem of existing techniques: Single-microphone techniques: very limited performance multi-microphone techniques: exploit spatial information multiple microphones required for source localisation A-priori assumptions about position of signal sources and microphone array: large sensitivity to deviations improve robustness (and performance) Assumption of spatio-temporally white noise extension to coloured noise Development of multi-microphone noise reduction and dereverberation techniques with better performance and robustness for coloured noise scenarios
10
State-of-the-art and contributions
Single-microphone techniques spectral subtraction [Boll 79, Ephraim 85, Xie 96] Signal-independent transformation Residual noise problem subspace-based [Dendrinos 91, Ephraim 95, Jensen 95] Signal-dependent transformation Signal + noise subspace Multi-microphone techniques fixed beamforming [Dolph 46, Cox 86, Ward 95, Elko 00] Fixed directivity pattern adaptive beamforming [Frost 72, Griffiths 82, Gannot 01] adapt to different acoustic environments performance `Generalised Sidelobe Canceller’ (GSC) inverse, matched filtering [Myoshi 88, Flanagan 93, Affes 97] 1. Robust broadband beamforming 3. Blind transfer function estimation and dereverberation only spectral information a-priori assumptions 2. Multi-microphone optimal filtering spatial information robustness 10
11
Overview Introduction Basic principles Robust broadband beamforming
Signal model Signal characteristics and acoustic environment Robust broadband beamforming Multi-microphone optimal filtering Acoustic transfer function estimation and dereverberation Conclusion and further research
12
Acoustic impulse response
Signal model Introduction Basic principles -Signal model -Characteristics Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Signal model for microphone signals in time-domain: filtered version of clean speech signal + additive coloured noise Acoustic impulse response Speech signal Additive noise
13
Signal model Introduction Basic principles -Signal model -Characteristics Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Multi-microphone signal enhancement: microphone signals are filtered with filters wn[k] and summed f [k] = total transfer function for speech component zv[k] = residual noise component Techniques differ in calculation of filters: Noise reduction : minimise residual noise zv[k] and limit speech distortion Dereverberation : f [k]=δ [k] by estimating acoustic impulse responses hn[k] Combined noise reduction and dereverberation
14
Signal characteristics
Introduction Basic principles -Signal model -Characteristics Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Speech: Broadband ( Hz) Non-stationary On/off-characteristic Speech detection algorithm (VAD) Linear low-rank model: linear combination of basis functions 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -0.4 -0.3 -0.2 -0.1 0.1 0.3 Amplitude Time (sec) (R=12…20) Noise: unknown signals (no reference available) slowly time-varying (fan) non-stationary (radio, speech) localised diffuse noise
15
Impulse response PSK row 9
Acoustic environment Introduction Basic principles -Signal model -Characteristics Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Reverberation time T60 : global characterisation Acoustic impulse responses: Acoustic filtering between 2 points in a room FIR filter (K=1000…2000 taps) Non-minimum-phase system no stable inverse Microphone array: Assumption: point sensors with ideal characteristics Deviations: gain, phase, position Distance speaker – microphone array: far-field near-field Car Room Church 70 ms 250 ms 1500 ms 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.4 -0.2 Time (sec) Amplitude Impulse response PSK row 9
16
Overview Introduction Basic principles Robust broadband beamforming
Novel design procedures for broadband beamformers Robust beamforming for gain and phase errors Multi-microphone optimal filtering Acoustic transfer function estimation and dereverberation Conclusion and further research
17
Fixed beamforming Speech and noise sources with overlapping spectrum at different positions Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Exploit spatial diversity by using multiple microphones Suppress noise and reverberation from certain directions Technique originally developed for radar applications: Smallband : delay compensation broadband Far-field : planar waves near-field : spherical waves Known sensor characteristics deviations - Low complexity - Robustness at low signal-to-noise ratio (SNR) - A-priori knowledge of microphone array characteristics Signal-independent FIR filter-and-sum structure: arbitrary spatial directivity pattern for arbitrary microphone array configuration
18
Filter-and-sum configuration
Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Objective: calculate filters wn[k] such that beamformer performs desired (fixed) spatial and spectral filtering Far-field: - planar waves - equal attenuation Spatial directivity pattern: Desired spatial directivity pattern: 2D filter design in angle and frequency
19
Design procedures Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Design filter w such that spatial directivity pattern optimally fits minimisation of cost function Broadband problem: no design for separate frequencies i design over complete frequency-angle region No approximations of integrals by finite Riemann-sum Microphone configuration not included in optimisation Cost functions: Least-squares quadratic function Non-linear cost function iterative optimisation = complex! [Kajala 99] amplitude and phase Double integrals only need to be calculated once
20
Design procedures Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion 2 non-iterative cost functions, based on eigenfilters: Eigenfilters: 1D and 2D FIR filter design Extension to design of broadband beamformers Novel cost functions: Conventional eigenfilter technique (G)EVD Eigenfilter based on TLS-criterion GEVD Conclusion: TLS-eigenfilter preferred non-iterative design procedure [Vaidyanathan 87, Pei 01] reference point required
21
Simulations Non-linear procedure TLS-Eigenfilter Delay-and-sum
Parameters: N=5, d=4cm L=20, fs=8kHz Pass: 40o-80o Stop: 0o-30o o-180o Angle (deg) Freq (Hz) dB Angle (deg) Freq (Hz) dB Angle (deg) Freq (Hz) dB
22
Near-field configuration
Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Near-field: spherical waves + attenuation Ultimate goal: design for all distances One specific distance: very similar to far-field design (different calculation of double integrals) Several distances: trivial extension for most cost functions, for TLS-eigenfilter = sum of generalised Rayleigh-quotients Take into account distance r between speaker - microphones Finite number (R) of distances Deviation for other distances Trade-off performance for different distances
23
Near-field pattern (r=0.2m)
Simulations Far-field pattern Near-field pattern (r=0.2m) Angle (deg) Frequency (Hz) dB Far-field design Parameters: N=5, d=4cm L=20, fs=8kHz Pass: 70o-110o Stop: 0o-60o o-180o Mixed near-field far-field Angle (deg) Frequency (Hz) dB
24
Robust broadband beamforming
Small deviations from the assumed microphone characteristics (gain, phase, position) large deviations from desired directivity pattern, especially for small-size microphone arrays In practice microphone characteristics are never exactly known Consider all feasible microphone characteristics and optimise average performance using probability as weight requires knowledge about probability density functions worst-case performance minimax optimisation problem finite grid of microphone characteristics high complexity Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Incorporate specific (random) deviations in design Measurement or calibration procedure
25
Simulations Non-linear design procedure
Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Non-linear design procedure N=3, positions: [ ] m, L=20, fs=8 kHz Passband = 0o-60o, Hz (endfire) Stopband = 80o-180o, Hz Robust design - average performance: Uniform pdf = gain ( ) and phase (-5o-10o) Deviation = [ ] and [5o -2o 5o] Design J Jdev Jmean Jmax Non-robust 0.1585 87.131 275.40 3623.6 Average cost 0.2196 0.2219 0.3371 0.4990 Maximum cost 0.1707 0.1990 0.4114 0.4167
26
Simulations Non-robust design Robust design No deviations
Deviations (gain/phase) Introduction Basic principles Beamforming -Design -Robustness Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Angle (deg) Frequency (Hz) dB Angle (deg) Frequency (Hz) dB Angle (deg) Frequency (Hz) dB Angle (deg) Frequency (Hz) dB
27
Simulations Non-robust design Robust design 27
28
Overview Introduction Basic principles Robust broadband beamforming
Multi-microphone optimal filtering GSVD-based optimal filtering technique Reduction of computational complexity Simulations Acoustic transfer function estimation and dereverberation Conclusion and further research
29
Multi-microphone optimal filtering
Objective: optimal estimate of speech components in microphone signals Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Multi-microphone Signal-dependent Robustness Minimise MSE No a-priori assumptions Multi-channel Wiener Filter Speech and noise independent 2nd order statistics noise stationary estimate during noise periods (VAD)
30
Multi-microphone optimal filtering
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Implementation procedure: based on Generalised Eigenvalue Decomposition (GEVD) take into account low-rank model of speech trade-off between noise reduction and speech distortion QRD [Rombouts 2002] , subband [Spriet 2001] lower complexity Generalised Eigenvalue Decomposition (GEVD): Speech detection mechanism is the only a-priori assumption: required for estimation of correlation matrices coloured noise! Signal-dependent FIR-filterbank Low-rank model
31
General class of estimators
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Multi-channel Wiener filter: always combination of noise reduction and (linear) speech distortion: estimation error: speech distortion residual noise General class: noise reduction speech distortion =1 : MMSE (equal importance) <1 : less speech distortion, less noise reduction >1 : more speech distortion, more noise reduction [Ephraim 95]
32
Frequency-domain analysis
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Decomposition in spectral and spatial filtering term Desired beamforming behaviour for simple scenarios spectral filtering (PSD) spatial filtering (coherence) Speech Noise
33
Complexity reduction Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Recursive version: each time step calculation GSVD + filter Complexity reduction using: Recursive techniques for recomputing GSVD [Moonen 90] Sub-sampling (stationary acoustic environments) High computational complexity Batch Recursive QRD [Rombouts] sub = 1 7504 Gflops 2.1 Gflops 358 Mflops sub = 20 375 Gflops 105 Mflops 18 Mflops (N = 4, L = 20, M=80, fs = 16 kHz, P = 4000, Q = 20000) Real-time implementation possible
34
Complexity reduction Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Incorporation in ‘Generalised Sidelobe Canceller’ (GSC) structure: adaptive beamforming Creation of speech reference and noise reference signals Standard multi-channel adaptive filter (LMS, APA) S Speech reference Optimal filter Noise reference(s) + – Adaptive filter delay Increase noise reduction performance Complexity reduction by using shorter filters
35
Simulations Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion N=4, SNR=0 dB, 3 noise sources (white, speech, music), fs=16 kHz Performance: improvement of signal-to-noise ratio (SNR) 15 Delay-and-sum beamformer GSC (LANC=400, noise ref=Griffiths-Jim) Recursive GSVD (L=20, no ANC) Recursive GSVD (L=20, LANC=400, all nref) 10 Unbiased SNR (dB) 5 500 1000 1500 Reverberation time (msec)
36
Simulations N=4, SNR=0 dB, 3 noise sources, fs=16 kHz, T60=300 msec
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion N=4, SNR=0 dB, 3 noise sources, fs=16 kHz, T60=300 msec ‘Power Transfer Functions’ (PTF) for speech and noise component -30 -25 -20 -15 -10 -5 Speech Noise Spectrum (dB) Recursive GSVD (L=20, no ANC) Recursive GSVD (L=20, LANC=400, all noise ref) 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz)
37
Conclusions GSVD-based optimal filtering technique:
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion GSVD-based optimal filtering technique: Multi-microphone extension of single-microphone subspace-based enhancement techniques Signal-dependent low-rank model of speech No a-priori assumptions about position of speaker and microphones SNR-improvement higher than GSC for all reverberation times and all considered acoustic scenarios More robust to deviations from signal model: Microphone characteristics Position of speaker VAD: only a-priori information! No effect on SNR-improvement Limited effect on speech distortion
38
Advantages - Disadvantages
Introduction Basic principles Beamforming Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations Transfer function estimation and dereverberation Conclusion Fixed beamforming Adaptive beamforming Optimal filtering Signal-dependent no yes Noise reduction + ++ +++ Dereverberation Complexity low average high VAD Robustness - (+) -- (+)
39
Overview Introduction Basic principles Robust broadband beamforming
Multi-microphone optimal filtering Acoustic transfer function estimation and dereverberation Time-domain technique Frequency-domain technique Combined noise reduction and dereverberation Conclusion and further research
40
Noise reduction and dereverberation
Objective Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion Time-domain Frequency-domain Blind estimation of acoustic impulse responses Noise reduction and dereverberation Dereverberation Source localisation S
41
Time-domain techniques
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion Signal model for N=2 and no background noise Subspace-based technique: impulse responses can be computed from null-space of speech correlation matrix Eigenvector corresponding to smallest eigenvalue Coloured noise: GEVD Problems occuring in time-domain technique: sensitivity to underestimation of impulse response length low-rank model in combination with background noise S(z) H0(z) H1(z) Y1(z) Y0(z) Signals -H1(z) H0(z) S Null-space ±α E(z)
42
Stochastic gradient algorithm
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion Batch estimation techniques form basis for deriving adaptive stochastic gradient algorithm Usage : Estimation of partial impulse responses time-delay estimation for acoustic source localisation For source localisation adaptive GEVD algorithm is more robust than adaptive EVD algorithm (and prewhitening) in reverberant environments with a large amount of noise
43
Frequency-domain techniques
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion Problems of time-domain technique frequency-domain Signal model: rank-1 model Estimation of acoustic transfer function vector H() from GEVD of correlation matrices and Corresponding to largest generalised eigenvalue no stochastic gradient algorithm available (yet) Unknown scaling factor in each frequency bin: can be determined only if norm is known algorithm only useful when position of source is fixed (e.g. desktop, car)
44
Combined noise reduction and dereverberation
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion Filtering operation in frequency domain: Dereverberation: normalised matched filter Combined noise reduction and dereverberation: Z() is optimal (MMSE) estimate of S() Optimal estimate of s[k] integration of multi-channel Wiener-filter with normalised matched filter Trade-off between both objectives Implementation: overlap-save Residual noise
45
Simulations N=4, d=2 cm, fs=16 kHz, SNR=0 dB, T60= 400 msec
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion N=4, d=2 cm, fs=16 kHz, SNR=0 dB, T60= 400 msec FFT-size L=1024, overlap R=16 Performance criteria: Signal-to-noise ratio (SNR) Dereverberation-index (DI) : SNR (dB) DI (dB) Original microphone signal 2.88 4.74 Noise reduction 16.82 4.73 Dereverberation 2.30 0.86 Combined noise reduction and dereverberation 10.12 1.35
46
Simulations Introduction Basic principles Beamforming
Multi-microphone optimal filtering Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation Conclusion
47
Conclusion Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Low signal quality due to background noise and reverberation signal enhancement to improve speech intelligibility and ASR performance Single-microphone techniques: spectral information Standard beamforming: a-priori assumptions Robust broadband beamforming Blind transfer function estimation and dereverberation No a-priori assumptions Multi-microphone Signal-dependent Multi-microphone optimal filtering
48
Contributions Robust broadband beamforming:
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Robust broadband beamforming: novel cost functions for broadband far-field design (non-linear, eigenfilter-based) extension to near-field and mixed near-field far-field 2 procedures for robust design against gain and phase deviations GSVD-based optimal filter technique for multi-microphone noise reduction: extension of single-microphone subspace-based techniques multiple microphones integration in GSC-structure better performance and robustness than beamforming Acoustic transfer function estimation and dereverberation: stochastic gradient algorithm for estimation of time-delay and acoustic source localisation (coloured noise) combined noise reduction and dereverberation in frequency-domain
49
Further research Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion Combination of multi-channel Wiener-filter and fixed beamforming: Low SNR: VAD fails poor performance of Wiener-filter Combined technique: more robust when VAD fails, better performance than fixed beamformers in other scenarios Acoustic transfer function estimation and dereverberation: Time-domain: underlying reason for high sensitivity Frequency-domain: unknown scaling factor BSS ? other blind identification techniques (LP, NL Kalman-filtering) Further complexity reduction of multi-channel optimal filtering technique Stochastic gradient algorithms Subband/frequency-domain
50
Relevant publications
Introduction Basic principles Beamforming Multi-microphone optimal filtering Transfer function estimation and dereverberation Conclusion S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multimicrophone speech enhancement,” IEEE Trans. Signal Processing, vol. 50, no. 9, pp , Sep S. Doclo and M. Moonen, “Multi-Microphone Noise Reduction Using Recursive GSVD-Based Optimal Filtering with ANC Postprocessing Stage,” Accepted for publication in IEEE Trans. Speech and Audio Processing, 2003. S. Doclo and M. Moonen, “Robust adaptive time delay estimation for speaker localisation in noisy and reverberant acoustic environments, EURASIP Journal on Applied Signal Processing, Sep S. Doclo and M. Moonen, “Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement,” in Proc. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Darmstadt, Germany, Sep. 2001, pp S. Doclo and M. Moonen, “Design of far-field and near-field broadband beamformers using eigenfilters,” Accepted for publication in Signal Processing, 2003. S. Doclo and M. Moonen, “Design of robust broadband beamformers for gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Processing, Oct Available at
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.