Presentation is loading. Please wait.

Presentation is loading. Please wait.

Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.

Similar presentations


Presentation on theme: "Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University."— Presentation transcript:

1 Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University 11:51 PM1

2 Introduction Blind Source Separation 11:51 PM Mixing process: Unmixing process: Convolutive 2 s1s1 s2s2

3 Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation 11:51 PM3

4 Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation In frequency domain: Difficult to separate Easy to separate 11:51 PM4

5 Introduction No. of sources < No. of sensor No. of sources = No. of sensor No. of sources > No. of sensor Overdetermined mixing Determined mixing Underdetermined mixing Difficult to separate Easy to separate 11:51 PM5

6 Approaches for BSS of Speech Signals Types of mixing Instantaneous mixingConvolutive mixing 11:51 PM6

7 Approaches for BSS of Speech Signals Instantaneous mixing Step 1:Selection of cost function Step 2:Minimization or maximization of the cost function 11:51 PM WH S1S1 S2S2 X2X2 Y1Y1 Y2Y2 Separated? X1X1 7

8 Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Information theoretic Non-Gaussianity Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech 11:51 PM Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Signals from two different sources are independent 8

9 Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method Newton’s method e.g. Informax ICA algorithm e.g. FastICA 11:51 PM9

10 Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: Advantage: No permutation problem Disadvantage: Slow convergence High computational cost for long filter taps Advantage: Low computational cost Fast convergence Disadvantage: Permutation Problem WH S1S1 S2S2 X1X1 X2X2 Y 1 Y 2 Y 2 Y 1 11:51 PM10 or

11 Permutation Problem in Frequency Domain BSS f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Still signals are mixed K point IFFT Corresponding to different sources Due to permutation problem One frequency bin Instantaneous ICA algorithm Solving permutation Problem y1y1 y2y2 y3y3 Separated signals Corresponding to y 3 11:51 PM11

12 Motivation 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 12

13 My Contribution - I 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 13

14 Algorithm for Solving the Permutation Problem f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Permutation problem One frequency bin Instantaneous ICA algorithm Permutation problem solved 11:51 PM14

15 Existing Method for Solving the Permutation Problem Direction Of Arrival (DOA) method: Position of the p th sensor Velocity of sound 11:51 PM Direction of y 1 = -30 o Direction of y 2 = 20 o 15

16 Existing Method for Solving the Permutation Problem Reasons for failure at lower freq:  Lower spacing causes error in phase difference measurement.  The relation is approximated for plane wave front under anechoic condition Disadvantages:  Fails at lower frequencies.  Fails when sources are near.  Room reverberation.  Sensor positions must be known. Direction Of Arrival (DOA) method: 11:51 PM16

17 Existing Method for Solving the Permutation Problem f1f1 f2f2 fkfk BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Low correlation High correlation Low correlation x1x1 x2x2 x3x3 Adjacent bands correlation method: 11:51 PM17

18 K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix No change Change permutation Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM With confidenceWithout confidence Example 18

19 K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix Disadvantage: The method is not robust Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM19

20 11:51 PM Existing Method for Solving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness 20

21 Proposed algorithm: Partial separation method (Parallel configuration) Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. 11:51 PM21 Time domain stage Frequency domain stage

22 Partial separation method (Parallel configuration) 11:51 PM22 Time domain stage Frequency domain stage

23 Parallel configuration Partial separation method (Cascade configuration) 11:51 PM23 Time domain stage Frequency domain stage

24 Advantages of Partial Separation method Robustness 11:51 PM24

25 Comparison with Adjacent Bands Correlation Method 11:51 PM25

26 PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check. 11:51 PM26 Comparison with DOA method

27 My Contribution -II 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 27

28 Underdetermined Blind Source Separation of Instantaneous Mixtures Mixture in time domain Time to TF domain Detection of SSPs Mixing matrix estimation Estimation of Sources 11:51 PM28

29 Mathematical Representation of Instantaneous Mixing Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: Time-Frequency domain: 11:51 PM29 P – No. of mixtures Q – No. of sources

30 Single Source Points in Time-Frequency domain Single source point 1Single source point 2 11:51 PM 0 0 30

31 Single source point 1Single source point 2 Single Source Points in Time-Frequency domain 11:51 PM31

32 Single source point 1Single source point 2 Scalar.·. At single source point 1:.·. At single source point 2: Single Source Points in Time-Frequency domain 11:51 PM32

33 Scatter Diagram of the Mixtures When Source are Perfectly Sparse 0 0 0 0 Example: 11:51 PM33

34 0 0 0 00 Example: Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse 11:51 PM34

35 Scatter Diagram of the Mixtures when Sources are Sparse 11:51 PM No. of sources = 6 No. of mixtures = 2 35

36 Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering 11:51 PM No. of sources = 6 No. of mixtures = 2 36

37 Scatter Diagram of the Mixtures when Sources are Not Perfectly Sparse 11:51 PM Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2 37

38 Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1Single source point 2 Scalar 11:51 PM Multi source point 38

39 Single source point 1Single source point 2 Scalar 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points Multi source point 39

40 Average of 15 pairs of speech utterances of length 10 s each 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points SSP MSP 40

41 SSP MSP Proposed Algorithm for the Detection of Single Source Points 11:51 PM41

42 Elimination of Outliers SSPs detection Clustering Outlier elimination 11:51 PM42

43 11:51 PM Experimental Results No. of mixtures =2, No. of sources =6 43

44 Detected Single Source Points, Three mixtures No. of mixtures =3, No. of sources =6 11:51 PM44

45 Comparison with Classical Algorithms for Determined Case No. of mixtures =2 No. of sources =2 Average of 500 experimental results 11:51 PM45 ->

46 Comparison with Method Proposed in [1], Underdetermined case [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006. 11:51 PM Normalized mean square error (NMSE) in mixing matrix estimation (dB) Order of the mixing matrices (PxQ) 46 P – No. of mixtures Q – No. of sources

47 Advantages of the Proposed algorithm Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t) is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. 1) Much simpler constrain: the algorithm does not require “single source zone”. 3) The algorithm is extremely simple but effective 2) Separation performance is better. 11:51 PM47 ->

48 My Contributions – III, IV and V 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 48

49 Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply Mask Apply mask Mask estimation Mic 1 Mic P Mixture in TF domain Separated signals in TF domain 11:51 PM49

50 Mathematical Representation Time domain: Frequency domain: 11:51 PM50 P – No. of mixtures Q – No. of sources

51 Single source points Instantaneous mixing Single source point 1Single source point 2 Real scalar Real Real scalar Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar 11:51 PM51

52 Basic Principle of Single Source Points Detection Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar The Hermitian angle between the complex vectors u 1 and u 2 will remain the same even if the vectors are multiplied by any complex scalars, whereas the pseudo angle will change. 11:51 PM52 ->

53 Algorithm for Single Source Points Detection θH2θH2 θH1θH1 θH2θH2 11:51 PM53 θH1θH1 OR

54 Clean Estimated Mask Estimation by k-means (KM) 11:51 PM54

55 Clean Estimated Mask Estimation by Fuzzy c-means (FCM) 11:51 PM55

56 Automatic Detection of Number of Sources 11:51 PM56 Cluster validation technique: For c = 2 to c max Cluster the data into c clusters. Calculate the cluster validation index. End Take c corresponding to the best cluster as the number of sources. ->

57 Elimination of Low Energy Points 11:51 PM57


Download ppt "Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University."

Similar presentations


Ads by Google