Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University 11:51 PM1
Introduction Blind Source Separation 11:51 PM Mixing process: Unmixing process: Convolutive 2 s1s1 s2s2
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation 11:51 PM3
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation In frequency domain: Difficult to separate Easy to separate 11:51 PM4
Introduction No. of sources < No. of sensor No. of sources = No. of sensor No. of sources > No. of sensor Overdetermined mixing Determined mixing Underdetermined mixing Difficult to separate Easy to separate 11:51 PM5
Approaches for BSS of Speech Signals Types of mixing Instantaneous mixingConvolutive mixing 11:51 PM6
Approaches for BSS of Speech Signals Instantaneous mixing Step 1:Selection of cost function Step 2:Minimization or maximization of the cost function 11:51 PM WH S1S1 S2S2 X2X2 Y1Y1 Y2Y2 Separated? X1X1 7
Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Information theoretic Non-Gaussianity Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech 11:51 PM Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Signals from two different sources are independent 8
Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method Newton’s method e.g. Informax ICA algorithm e.g. FastICA 11:51 PM9
Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: Advantage: No permutation problem Disadvantage: Slow convergence High computational cost for long filter taps Advantage: Low computational cost Fast convergence Disadvantage: Permutation Problem WH S1S1 S2S2 X1X1 X2X2 Y 1 Y 2 Y 2 Y 1 11:51 PM10 or
Permutation Problem in Frequency Domain BSS f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Still signals are mixed K point IFFT Corresponding to different sources Due to permutation problem One frequency bin Instantaneous ICA algorithm Solving permutation Problem y1y1 y2y2 y3y3 Separated signals Corresponding to y 3 11:51 PM11
Motivation 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 12
My Contribution - I 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 13
Algorithm for Solving the Permutation Problem f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Permutation problem One frequency bin Instantaneous ICA algorithm Permutation problem solved 11:51 PM14
Existing Method for Solving the Permutation Problem Direction Of Arrival (DOA) method: Position of the p th sensor Velocity of sound 11:51 PM Direction of y 1 = -30 o Direction of y 2 = 20 o 15
Existing Method for Solving the Permutation Problem Reasons for failure at lower freq: Lower spacing causes error in phase difference measurement. The relation is approximated for plane wave front under anechoic condition Disadvantages: Fails at lower frequencies. Fails when sources are near. Room reverberation. Sensor positions must be known. Direction Of Arrival (DOA) method: 11:51 PM16
Existing Method for Solving the Permutation Problem f1f1 f2f2 fkfk BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Low correlation High correlation Low correlation x1x1 x2x2 x3x3 Adjacent bands correlation method: 11:51 PM17
K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix No change Change permutation Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM With confidenceWithout confidence Example 18
K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix Disadvantage: The method is not robust Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM19
11:51 PM Existing Method for Solving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness 20
Proposed algorithm: Partial separation method (Parallel configuration) Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098– :51 PM21 Time domain stage Frequency domain stage
Partial separation method (Parallel configuration) 11:51 PM22 Time domain stage Frequency domain stage
Parallel configuration Partial separation method (Cascade configuration) 11:51 PM23 Time domain stage Frequency domain stage
Advantages of Partial Separation method Robustness 11:51 PM24
Comparison with Adjacent Bands Correlation Method 11:51 PM25
PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check. 11:51 PM26 Comparison with DOA method
My Contribution -II 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 27
Underdetermined Blind Source Separation of Instantaneous Mixtures Mixture in time domain Time to TF domain Detection of SSPs Mixing matrix estimation Estimation of Sources 11:51 PM28
Mathematical Representation of Instantaneous Mixing Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: Time-Frequency domain: 11:51 PM29 P – No. of mixtures Q – No. of sources
Single Source Points in Time-Frequency domain Single source point 1Single source point 2 11:51 PM
Single source point 1Single source point 2 Single Source Points in Time-Frequency domain 11:51 PM31
Single source point 1Single source point 2 Scalar.·. At single source point 1:.·. At single source point 2: Single Source Points in Time-Frequency domain 11:51 PM32
Scatter Diagram of the Mixtures When Source are Perfectly Sparse Example: 11:51 PM33
Example: Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse 11:51 PM34
Scatter Diagram of the Mixtures when Sources are Sparse 11:51 PM No. of sources = 6 No. of mixtures = 2 35
Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering 11:51 PM No. of sources = 6 No. of mixtures = 2 36
Scatter Diagram of the Mixtures when Sources are Not Perfectly Sparse 11:51 PM Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2 37
Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1Single source point 2 Scalar 11:51 PM Multi source point 38
Single source point 1Single source point 2 Scalar 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points Multi source point 39
Average of 15 pairs of speech utterances of length 10 s each 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points SSP MSP 40
SSP MSP Proposed Algorithm for the Detection of Single Source Points 11:51 PM41
Elimination of Outliers SSPs detection Clustering Outlier elimination 11:51 PM42
11:51 PM Experimental Results No. of mixtures =2, No. of sources =6 43
Detected Single Source Points, Three mixtures No. of mixtures =3, No. of sources =6 11:51 PM44
Comparison with Classical Algorithms for Determined Case No. of mixtures =2 No. of sources =2 Average of 500 experimental results 11:51 PM45 ->
Comparison with Method Proposed in [1], Underdetermined case [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb :51 PM Normalized mean square error (NMSE) in mixing matrix estimation (dB) Order of the mixing matrices (PxQ) 46 P – No. of mixtures Q – No. of sources
Advantages of the Proposed algorithm Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t) is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. 1) Much simpler constrain: the algorithm does not require “single source zone”. 3) The algorithm is extremely simple but effective 2) Separation performance is better. 11:51 PM47 ->
My Contributions – III, IV and V 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 48
Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply Mask Apply mask Mask estimation Mic 1 Mic P Mixture in TF domain Separated signals in TF domain 11:51 PM49
Mathematical Representation Time domain: Frequency domain: 11:51 PM50 P – No. of mixtures Q – No. of sources
Single source points Instantaneous mixing Single source point 1Single source point 2 Real scalar Real Real scalar Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar 11:51 PM51
Basic Principle of Single Source Points Detection Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar The Hermitian angle between the complex vectors u 1 and u 2 will remain the same even if the vectors are multiplied by any complex scalars, whereas the pseudo angle will change. 11:51 PM52 ->
Algorithm for Single Source Points Detection θH2θH2 θH1θH1 θH2θH2 11:51 PM53 θH1θH1 OR
Clean Estimated Mask Estimation by k-means (KM) 11:51 PM54
Clean Estimated Mask Estimation by Fuzzy c-means (FCM) 11:51 PM55
Automatic Detection of Number of Sources 11:51 PM56 Cluster validation technique: For c = 2 to c max Cluster the data into c clusters. Calculate the cluster validation index. End Take c corresponding to the best cluster as the number of sources. ->
Elimination of Low Energy Points 11:51 PM57