Download presentation
Presentation is loading. Please wait.
Published byAugustus Alexander Modified over 9 years ago
2
Spatial vs. Blind Approaches for Speaker Separation: Structural Differences and Beyond Julien Bourgeois RIC/AD
3
2 x 1 (t)x 4 (t) Array Processor Recover clean individual speech flows: separate and denoise the sources Microphone Array get mixtures of the sources and noise Individual speech flows s 1 (t ) s 2 (t) Road Noise spatially diffuse Several simultaneous speakers (sources) spatially located Problem Context
4
3 “Spatial” vs. “Statistical” Techniques Spatial Filte r + - Min Power Filter s Min Dependence Statistical “Cocooning”
5
4 Spatial technique (Beamforming) s1s1 s2s2 + h2h2 h 1 w2w2 y1y1 x 1 (signal ref) x 2 (noise ref) + + High cross-talk levels : cancellation of the target signal (leakage). Solution : Voice Activity Detector. unknow n weak
6
5 Blind Source Separation (BSS) s1s1 s2s2 h1h1 h2h2 w1w1 w2w2 y1y1 y2y2 x1x1 x2x2 w 1 and w 2 are jointly optimized such that the outputs are independent. Sources are assumed to be independent. unknow n + + + + Dependence measure
7
6 BSS - Second Order Criteria There are plenty independence measures... We choose a decorrelation criterion. Other separation criteria include Higher Order Statistics, that are difficult to estimate. Second Order Statistics are easier to estimate...
8
7 BSS - Second Order Criteria Specifically Set (hyperbolas) of decorrelators (not all are separators) We need more info. Non-stationary sources: “non stationary hyperbolas” They intersect at the solution:.... but they do not determine w 1 and w 2 uniquely.
9
8 BSS - Graphically... D2(t2)D2(t2) D 2 (t 1 ) D 2 (t 1 ) + D 2 (t 2 ) Non-stationary sources generates hyperbolas that intersect at the separation point -(h 1, h 2 ) and at -(1/h 2, 1/h 1 ).
10
9 Beamforming vs. B SS Weak cross-talk levels or Voice Activity Detector. Leakage problem. 1D Search. Independence prior on (s 1,s 2 ) Permutation ambiguity. 2D Search. Asymptotic performances of BSS are more “robust” than Beamforming.
11
10 Adaptive Behavior: Comparison Framework s 1 = 0 s2s2 h2h2 w2w2 y1y1 y2y2 x1x1 x2x2 Comparison framework: only one source s 2 stationary Gaussian s 1 = h 1 = 0 (no leakage) Avoid structural differences between the two criterions. Both criteria are minimized with a STOCHASTIC gradient descent. Q: How well estimated is this gradient with finite length signals ? ++
12
11 Estimation Error on the Gradient At the starting point w 2 = 0, numerical evaluation of the variance of the estimation error. BSS converges more slowly because its gradient is more “random”. In noisy condition, BSS does not bring any gain if the cross-talk is below a certain threshold. This threshold is smaller for MV (beamforming) BSS Beamforming
13
12 Conclusion Beamforming is based on power minimization principle. In practice: weak cross-talk levels or needs a Voice Activity Detector (VAD) Asymptotic performances depends on the quality of the VAD. Robust stochastic behavior. Blind Source Separation based on independence of the sources. Asymptotic performances: exact separation. Stochastic behavior: needs a longer signals to estimate the gradient. Moreover sources on a finite (short) time scale are not exactly independent. Both methods cannot reduce diffuse background noise.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.