Download presentation
Presentation is loading. Please wait.
2
1 Blind Separation of Audio Mixtures Using Direct Estimation of Delays Arie Yeredor Dept. of Elect. Eng. – Systems School of Electrical Engineering Tel-Aviv University
3
2 Problem Formulation The mixture model: Available samples: Assumptions: Sources: are bandlimited, WSS (not really necessary!), peristently uncorrelated (necessary!) Blindness: the mixing coefficients, delays and sources ’ spectra are unknown; Goal: Estimate the unknown parameters; Reconstruct the sources.
4
3 Falling between the chairs Static BSS is obviously under-parameterized; Convolutive BSS is not only over-parameterized, but also inappropriate for accommodating fractional delays, especially with FIR models.
5
4 Inherent ambiguities Sources ’ scaling assume normalized power Sources ’ time-origin assume Sources ’ permutation assume we don ’ t care
6
5 Mixtures ’ correlations The mixtures ’ correlations are given by
7
6 Mixtures ’ spectra Fourier-transforming the correlations: or with is the mixing matrix contains freq.-domain delays: denotes Hadamard (element-wise) product
8
7 Estimate mixtures ’ spectra Use, e.g., Blackman-Tukey estimates: with where is some rough upper-bound on the correlations length of all sources. The frequency axis is rescaled to the range, thus the delays are normalized to units of.
9
8 Obtain a frequency-dependent joint-diagonalization problem Use a selected set of frequencies, and attempt to jointly diagonalize the estimated spectral matrices by minimizing w.r.t: the mixing parameters; the delays; the respective sources ’ spectra:.
10
9 Extended AC-DC Use an extended veriosn of the “ Alternating Columns - Diagonal Centers ” (AC-DC, Yeredor, ’ 02) algorithm for the joint diagonalization: Alternate between minimizations w.r.t.: (in the DC phase) each column of (in the AC-1 phase) each column of (in the AC-2 phase) In each phase all other parameters are assumed fixed.
11
10 The DC phase Fortunately, is quadratic w.r.t. the sources ’ spectra (parameterized by ) with the -th term depending only on. Thus where is the -th column of, and denotes the pseudo-inverse of ( denoting an all-ones vector and denoting Kronecker ’ s product)
12
11 The AC-1 phase Minimization w.r.t., the -th column of, can be attained, using some manipulations, from the the largest eigenvalue and associated eigenvector of a specially-constructed matrix, where being the -th column of, and
13
12 The AC-2 phase Minimization w.r.t., the -th column of, generally requires maximization of: where with respect to all
14
13 The AC-2 phase (cont ’ d) For the case this maximization translates into a simple line-search (for each ), maximizing: In addition, in this case the maximization only depends on the sign of elements of, which means that effectively the AC-2 phase is almost always an integral part of the AC-1 phase.
15
14 Reconstruction of the sources Comfortable reconstruction in the frequency domain: Compute the observations ’ DFTs: where and
16
15 Sources reconst. (cont ’ d) Using the estimated mixing parameters and delays, compute: Compute Inverse-transform to get the estimated sources (up to negligible end-effects):
17
16 Simulation results We the performance of the proposed “ Pure Delays Demixing ” ( “ PUDDING ” ) scheme in two sets of experiments: Experiment 1: Synthetic mixture with TIMIT sources; Experiment 2: True recordings*. * by J ö rn Anem ü ller and Birger Kollmeier (Oldenburg University): [1] Adaptive separation of acoustic sources for anechoic conditions: A constrained frequency domain approach, Speech Communication 39 (2003) pp. 79-95
18
17 Synthetic Mixture TIMIT source signals sampled at 8KHz, upsampled by 10, mixed with parameters then downsamples by 1:10 – resulting in effective delays of
19
18 Algorithm setup (40 spectral matrices); 40 equi-spaced frequencies with ; Initial guess for the mixing parameters was an all-ones matrix; Initial guesses for the non-zero delays were randomly chosen integers (with the correct signs); Single AC-1/AC-2 sweep between DC sweeps.
20
19 Estimated correlations
21
20 Estimated Spectra
22
21 LS Convergence and delays estimation
23
22 Audio: Demixing synthetic mixtures PUDDING
24
23 Audio: “ Demixing ” synthetic mixtures ignoring delays SEMI-GEENIE We demonstrate the importance of estimating the delays, by demonstrating separation when the static mixing coefficients are known and the delays are ignored.
25
24 Audio: Robustness to additive white noise (3dB SNR) PUDDING
26
25 True recordings: anechoic chamber setup (not to scale) 35cm 3m 2m 60 0 PUDDING Compare to [1]
27
26 Conclusions PUDDING – PUre Delays DemixING: An iterative algorithm for BSS of anechoic mixtures involving unknown delays; Works by optimizing a frequency-dependent joint diagonalization criterion; Based on the extension of a static joint diagonalization algorithm (AC-DC), iterates between minimization w.r.t. the unknown spectra, coefficients and delays; Typical convergence – within 10-20 iterations; Although the derivation assumed stationarity for simplicity, the only essential assumption is persistent decorrelation between sources – good performance with speech sources. Some frequency-dependent regularization is required when the static mixing coefficients form a nearly-singular matrix (not discussed here, due to timing constraints).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.