Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University.

Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University of Brasília (UnB), Brasília, Brazil Institute for Information Technology, Ilmenau University of Technology, Ilmenau, Germany Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany Freie Universität, Berlin, Germany Pedro F. C. Lima, Ricardo Kehrle Miranda, Joao Paulo C. L. da Costa, Ricardo Zelenovsky, Yizheng Yuan and Giovanni Del Galdo Gold Coast, Fachgebiet Hochfrequenz- und Mikrowellentechnik

Outline DVT Motivation Problem formulation State of the art
Proposed method Simulation results Conclusions EMT

Motivation (1) Application of microphone arrays Hearing aids
Teleconference devices Bioacustics Speaker recognition in forensic or security applications Attenuate interference by spatially separating the sounds Increase intelligibility Reduction of the computational complexity for real time applications Array of mics BSS algorithm Captured signals Estimated source signals Speaker ID Feature extraction Feature matching Speaker recognition algorithm

Motivation (2) State-of-the-art Blind Source Separation (BSS) for microphone arrays Simplified block diagram for estimation of the BSS matrix Approach based on approximate joint diagonalization (AJD) and correction of the permutation ambiguity Proposed two step approach to solve the permutation ambiguity [1] Solution for the permutation ambiguity Estimated Separation Matrix 𝑾 (𝑓) Preliminary estimation via AJD 𝑯 𝑎𝑗𝑑 (𝑓) Scale ambiguity correction First Stage 𝑾 𝑠𝑡𝑔1 (𝑓) Second Stage D.-T. Pham and Ch. Servière, "Permutation correction in the frequency domain in blind separation of speech mixtures", in EURASIP Journal on Applied Signal Processing, 2006

Motivation (3) Proposed method: Reduction of computational effort
Computation of one set of the dispersions The non-stationarity assumption of the approximate joint diagonalization (AJD) more tolerant to stationary scenarios

Problem formulation (1)
Convolutive Mixture Considering 𝐼 sound sources: And for 𝐽 microphones: The j-th observed signal Room Impulse Reponse (RIR) between mic 𝑗 and source 𝑖 Goal of the separation system Denoting the output or recovered signals by: We can obtain the transmitted signals via deconvolution: Thus, the goal of the separation system is to obtain an estimate of: Or, in practice: Estimated mixing matrix Separation matrix

Simplified block diagram: Mapping to the frequency domain and estimation via AJD of: Solution to permutation and scaling ambiguities, obtaining: Computation of the estimated signals in the frequency domain and mapping back to the time domain. Reconstruction of the signals by superposition of the segments.

Obtaining - preliminary estimation of the mixing mixture 1st step: estimation of the spectral matrices of the observed signals: 𝑺 𝒙 𝒇,𝒃 ∈ ℂ 𝑱 ×𝑱 Preparation for the approximate joint diagonalization (AJD) process Segment Data (into B blocks with overlap) DFT Computation for each block (FFT if N is power of 2) Each block with N samples Periodograms Averaging (between m consecutive blocks) Hann Modified Periodogram Computation Windowing (per block)

Obtaining - preliminary estimation of the mixing mixture Note that except for the normalization factor, 𝐏 𝐱 𝑓,𝑏 ∈ ℂ 𝐽 ×𝐽 provides the sample covariance matrix. If the source signals are mutually non-correlated, this matrix is diagonal. Writing , for B time blocks: Non-stationary signals among the time blocks B matrices in an approximate joint diagonalization (AJD) problem AJD problem: independently solved for each frequency component Estimated spectral matrix of the source signals (unknown) ...

Simplified block diagram: After obtaining: The permutation and scaling ambiguities should be solved in order to obtain The focus of this work is on the permutation ambiguity.

Scaling and permutation ambiguities Due to the separate solutions among the F frequency components, 𝐇 𝒂𝒋𝒅 𝑓 ∈ ℂ J × I is estimated up to random scale changes and column permutations. That is, considering 𝐖 𝐚𝐣𝐝 𝑓 = 𝐇 𝐚𝐣𝐝 −1 𝑓 : 𝚷 𝑓 : an unknown (𝐼 × 𝐼) permutation matrix, permuting the rows of 𝐖 𝑓 D 𝑓 : an unknown (𝐼 × 𝐼) diagonal matrix that changes the scale of the rows of 𝐖 𝑓 . Given H ajd 𝑓 ∈ ℂ 𝐽 × 𝐼 , we need to find Π 𝑓 and D 𝑓 . @f: @f+1: Illustration of a permutation between adjacent frequency components.

State of the art (1) State-of-the-art method
Two stage approach for the correction of the permutation ambiguity Comparing the channels between neighbor frequencies for all sources Solution for the permutation ambiguity Estimated Separation Matrix 𝑾 (𝑓) Preliminary estimation via AJD 𝑯 𝑎𝑗𝑑 (𝑓) Scale ambiguity correction First Stage 𝑾 𝑠𝑡𝑔1 (𝑓) Second Stage Simplified block diagram of the process for estimation of the separation matrix

Solution: close to the identity matrix
State of the art (2) 1st Stage Assumption: continuous frequency responses of the mixing channels Ideally we would have: Considering discrete frequency components in 𝑯 ajd 𝑓 , we can search for 𝚷 𝑖 𝑓 , among all possible I! permutation matrices* according to the following criteria: Then, we update 𝐖 𝐚𝐣𝐝 𝒇 , obtaining 𝐖 𝐬𝐭𝐠𝟏 𝒇 ∈ ℂ 𝐈 × 𝑱 Identity matrix Searched permutation matrix Solution: close to the identity matrix

State of the art (3) State-of-the-art method
Two stage approach for the correction of the permutation ambiguity Comparing the dispersion between the estimated source covariance matrices Solution for the permutation ambiguity Estimated Separation Matrix 𝑾 (𝑓) Preliminary estimation via AJD 𝑯 𝑎𝑗𝑑 (𝑓) Scale ambiguity correction First Stage 𝑾 𝑠𝑡𝑔1 (𝑓) Second Stage Simplified block diagram of the process for estimation of the separation matrix

State of the art (4) 2nd Stage
Computation of the spectrum of the reconstructed signals Spectrum 𝑆 𝑦 𝑓,𝑏,𝑖 of the i-th reconstructed signal Calculation of the source profiles Calculation of the smoothed profiles (low pass) using the zero mean source profiles E`: Assuming only two sources, calculation of the dispersion:

After applying log, centralizing (zero mean) and smoothing (low pass filter) the spectrograms, we can compute their difference between consecutive frequencies 𝐷 1 𝑓,𝑏 and calculate the dispersions 𝜎 2 𝐷 1 (𝑓) of the differences: Permutations: points of minimum dispersion:

Calculation of the differences 𝐷 2 𝑓,𝑏 from the profiles 𝐻 𝑦 𝑓,𝑏,𝑖 : where: with: and:

State of the art (7) 𝜎 2 𝐷 1 𝑓 𝑙 < 𝜎 2 𝐷 2 𝑓 𝑙 2nd Stage
Variance of 𝐷 1 𝑓,𝑘 , after correction State of the art (7) 2nd Stage When permutations are corrected, the dispersion minima disappear. Iteratively search for the global minimum of 𝜎 2 𝐷 1 (𝑓) to find the frequencies 𝑓 𝑙 where permutations occur Computation of the second set of dispersions 𝜎 2 𝐷 2 𝑓 In case of permutations, a maximum for dispersion 𝜎 2 𝐷 2 (𝑓) Dispersions of 𝐷 1 𝑓,𝑏 and 𝐷 2 𝑓,𝑏 before correction 𝜎 2 𝐷 1 𝑓 𝑙 < 𝜎 2 𝐷 2 𝑓 𝑙

Dispersions of 𝐷 1 𝑓,𝑏 and 𝐷 2 𝑓,𝑏 after correction
State of the art (8) 2nd Stage When there are no remaining permutations to be fixed: Dispersions of 𝐷 1 𝑓,𝑏 and 𝐷 2 𝑓,𝑏 after correction n 𝜎 2 𝐷 1 𝑓 > 𝜎 2 𝐷 2 𝑓 ∀ 𝑓

Proposed method (1) Improve the second stage
Reduce the computational complexity of the second stage from [1] Solution for the permutation ambiguity Estimated Separation Matrix 𝑾 (𝑓) Preliminary estimation via AJD 𝑯 𝑎𝑗𝑑 (𝑓) Scale ambiguity correction First Stage 𝑾 𝑠𝑡𝑔1 (𝑓) Second Stage Simplified block diagram of the process for estimation of the separation matrix [1] D.-T. Pham and Ch. Servière, "Permutation correction in the frequency domain in blind separation of speech mixtures", in EURASIP Journal on Applied Signal Processing, 2006.

Proposed method (2) Complete workflow of the permutation solution:
First Stage [1] Proposed second stage

Proposed method (3) Proposed method:
Computation of a threshold value 𝑇 ℎ for 𝜎 2 𝐷 2 (𝑓) Peak values that exceed the threshold 𝑇 ℎ are recognized as peaks related to permutations Search zones Correction of several permutations in single iteration

Proposed method (4) Calculation of the threshold value 𝑇 ℎ :
Ascending sorting of 𝜎 2 𝐷 2 (𝑓): 𝜎 2 𝐷 2 ,sort𝑒𝑑 ( 𝑛 𝑓 ) Cumulative sum of 𝜎 2 𝐷 2 ,sort𝑒𝑑 ( 𝑛 𝑓 ) Proposed threshold median of 𝜎 2 𝐷 2 (𝑓)

Simulation results (1) Performance measures: Simulation Parameters
Calculation time for 2nd stage Percentage of success: Number of correctly aligned freq. bins / total number of bins Assumptions: perfect estimation of 𝑯 𝑎𝑗𝑑 𝑓 . Randomly generated permutations Simulation Parameters #Sources (I) and # Mics (J) 2 and 2, respectively Source 1 signal ‘poem male 30s.wav’ (Nion et al., 2010) Source 2 signal ‘sentence female 28s.wav’ (Nion et al., 2010) Sample rate 11,025 kHz Mixing Channels ‘h256’ (Serviere & Pham, 2006) Channels length (L) 256 Núm. of blocks m 1 Overlap (∝) 0,75

Simulation results (2) Calculation time: 2nd stage N=2048

Simulation results (3) Percentage of success N=2048

Conclusions Proposed BSS with lower computational complexity
Accuracy: close the state-of-the-art approach approx. 99 % Complexity: 3 to 10 times faster for the simulated scenario Future work Reduce the computational complexity of the first stage

Thank you for your attention!
Pedro Lima Ricardo Miranda João Paulo C. L. Da Costa Ricardo Zelenovsky Giovanni Del Galdo Yizheng Yuan Gold Coast, Fachgebiet Hochfrequenz- und Mikrowellentechnik

Simulation results (3) Calculation Time: 2nd stage only
In general, a reduction of calculation time can be observed when compared to the original method. N=2048 N=4096

Simulation results (4) Calculation time: complete permutation solution
Here the 1st stage dominates the calculation time due to the matrix update process used – nonetheless, it can be optimized. Even considering the complete solution, the method shows improvements in calculation time. Para N=2048 Para N=4096 Para N=2048 Para N=4096

Simulation results (5) Percentage of success
Number of correctly aligned freq. bins / total number of bins Despite the reduction in calculation time, similar qualitative results are presented. Para N=2048 Para N=4096

Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University.

Similar presentations

Presentation on theme: "Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University.

Similar presentations

Presentation on theme: "Low Complexity Blind Separation Technique to Solve the Permutation Ambiguity of Convolutive Speech Mixtures Department of Electrical Engineering, University."— Presentation transcript:

Similar presentations

About project

Feedback