Download presentation
Presentation is loading. Please wait.
Published byAnnabel Blake Modified over 9 years ago
1
Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University 11:51 PM1
2
Introduction Blind Source Separation 11:51 PM Mixing process: Unmixing process: Convolutive 2 s1s1 s2s2
3
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation 11:51 PM3
4
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation In frequency domain: Difficult to separate Easy to separate 11:51 PM4
5
Introduction No. of sources < No. of sensor No. of sources = No. of sensor No. of sources > No. of sensor Overdetermined mixing Determined mixing Underdetermined mixing Difficult to separate Easy to separate 11:51 PM5
6
Approaches for BSS of Speech Signals Types of mixing Instantaneous mixingConvolutive mixing 11:51 PM6
7
Approaches for BSS of Speech Signals Instantaneous mixing Step 1:Selection of cost function Step 2:Minimization or maximization of the cost function 11:51 PM WH S1S1 S2S2 X2X2 Y1Y1 Y2Y2 Separated? X1X1 7
8
Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Information theoretic Non-Gaussianity Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech 11:51 PM Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Signals from two different sources are independent 8
9
Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method Newton’s method e.g. Informax ICA algorithm e.g. FastICA 11:51 PM9
10
Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: Advantage: No permutation problem Disadvantage: Slow convergence High computational cost for long filter taps Advantage: Low computational cost Fast convergence Disadvantage: Permutation Problem WH S1S1 S2S2 X1X1 X2X2 Y 1 Y 2 Y 2 Y 1 11:51 PM10 or
11
Permutation Problem in Frequency Domain BSS f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Still signals are mixed K point IFFT Corresponding to different sources Due to permutation problem One frequency bin Instantaneous ICA algorithm Solving permutation Problem y1y1 y2y2 y3y3 Separated signals Corresponding to y 3 11:51 PM11
12
Motivation 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 12
13
My Contribution - I 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 13
14
Algorithm for Solving the Permutation Problem f1f1 f2f2 fkfk x1x1 x2x2 x3x3 BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Permutation problem One frequency bin Instantaneous ICA algorithm Permutation problem solved 11:51 PM14
15
Existing Method for Solving the Permutation Problem Direction Of Arrival (DOA) method: Position of the p th sensor Velocity of sound 11:51 PM Direction of y 1 = -30 o Direction of y 2 = 20 o 15
16
Existing Method for Solving the Permutation Problem Reasons for failure at lower freq: Lower spacing causes error in phase difference measurement. The relation is approximated for plane wave front under anechoic condition Disadvantages: Fails at lower frequencies. Fails when sources are near. Room reverberation. Sensor positions must be known. Direction Of Arrival (DOA) method: 11:51 PM16
17
Existing Method for Solving the Permutation Problem f1f1 f2f2 fkfk BSS Mixed signals K point FFT y1y1 y2y2 y3y3 Separated signals K point IFFT Solving permutation Problem Low correlation High correlation Low correlation x1x1 x2x2 x3x3 Adjacent bands correlation method: 11:51 PM17
18
K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix No change Change permutation Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM With confidenceWithout confidence Example 18
19
K-1 K K+1K+2 K+3 …….. K-1 K K+1K+2 K+3 …….. r12 r21 r11 r22 r11 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s1s1 s2s2 Correlation matrix Disadvantage: The method is not robust Existing Method for Solving the Permutation Problem Adjacent bands correlation method: 11:51 PM19
20
11:51 PM Existing Method for Solving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness 20
21
Proposed algorithm: Partial separation method (Parallel configuration) Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. 11:51 PM21 Time domain stage Frequency domain stage
22
Partial separation method (Parallel configuration) 11:51 PM22 Time domain stage Frequency domain stage
23
Parallel configuration Partial separation method (Cascade configuration) 11:51 PM23 Time domain stage Frequency domain stage
24
Advantages of Partial Separation method Robustness 11:51 PM24
25
Comparison with Adjacent Bands Correlation Method 11:51 PM25
26
PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check. 11:51 PM26 Comparison with DOA method
27
My Contribution -II 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 27
28
Underdetermined Blind Source Separation of Instantaneous Mixtures Mixture in time domain Time to TF domain Detection of SSPs Mixing matrix estimation Estimation of Sources 11:51 PM28
29
Mathematical Representation of Instantaneous Mixing Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: Time-Frequency domain: 11:51 PM29 P – No. of mixtures Q – No. of sources
30
Single Source Points in Time-Frequency domain Single source point 1Single source point 2 11:51 PM 0 0 30
31
Single source point 1Single source point 2 Single Source Points in Time-Frequency domain 11:51 PM31
32
Single source point 1Single source point 2 Scalar.·. At single source point 1:.·. At single source point 2: Single Source Points in Time-Frequency domain 11:51 PM32
33
Scatter Diagram of the Mixtures When Source are Perfectly Sparse 0 0 0 0 Example: 11:51 PM33
34
0 0 0 00 Example: Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse 11:51 PM34
35
Scatter Diagram of the Mixtures when Sources are Sparse 11:51 PM No. of sources = 6 No. of mixtures = 2 35
36
Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering 11:51 PM No. of sources = 6 No. of mixtures = 2 36
37
Scatter Diagram of the Mixtures when Sources are Not Perfectly Sparse 11:51 PM Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2 37
38
Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1Single source point 2 Scalar 11:51 PM Multi source point 38
39
Single source point 1Single source point 2 Scalar 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points Multi source point 39
40
Average of 15 pairs of speech utterances of length 10 s each 11:51 PM Principle of the Proposed Algorithm for the Detection of Single Source Points SSP MSP 40
41
SSP MSP Proposed Algorithm for the Detection of Single Source Points 11:51 PM41
42
Elimination of Outliers SSPs detection Clustering Outlier elimination 11:51 PM42
43
11:51 PM Experimental Results No. of mixtures =2, No. of sources =6 43
44
Detected Single Source Points, Three mixtures No. of mixtures =3, No. of sources =6 11:51 PM44
45
Comparison with Classical Algorithms for Determined Case No. of mixtures =2 No. of sources =2 Average of 500 experimental results 11:51 PM45 ->
46
Comparison with Method Proposed in [1], Underdetermined case [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006. 11:51 PM Normalized mean square error (NMSE) in mixing matrix estimation (dB) Order of the mixing matrices (PxQ) 46 P – No. of mixtures Q – No. of sources
47
Advantages of the Proposed algorithm Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t) is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. 1) Much simpler constrain: the algorithm does not require “single source zone”. 3) The algorithm is extremely simple but effective 2) Separation performance is better. 11:51 PM47 ->
48
My Contributions – III, IV and V 11:51 PM # mixtures ≥ # sources # mixtures < # sources BSS Determined/ Overdetermined Underdetermined Instantaneous Convolutive Frequency domain Time domain Mixing matrix estimation Frequency bin- wise separation Permutation problem Source estimation Automatic detection of no. of sources 48
49
Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply Mask Apply mask Mask estimation Mic 1 Mic P Mixture in TF domain Separated signals in TF domain 11:51 PM49
50
Mathematical Representation Time domain: Frequency domain: 11:51 PM50 P – No. of mixtures Q – No. of sources
51
Single source points Instantaneous mixing Single source point 1Single source point 2 Real scalar Real Real scalar Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar 11:51 PM51
52
Basic Principle of Single Source Points Detection Convolutive mixing Single source point 1Single source point 2 Complex scalar Complex Complex scalar The Hermitian angle between the complex vectors u 1 and u 2 will remain the same even if the vectors are multiplied by any complex scalars, whereas the pseudo angle will change. 11:51 PM52 ->
53
Algorithm for Single Source Points Detection θH2θH2 θH1θH1 θH2θH2 11:51 PM53 θH1θH1 OR
54
Clean Estimated Mask Estimation by k-means (KM) 11:51 PM54
55
Clean Estimated Mask Estimation by Fuzzy c-means (FCM) 11:51 PM55
56
Automatic Detection of Number of Sources 11:51 PM56 Cluster validation technique: For c = 2 to c max Cluster the data into c clusters. Calculate the cluster validation index. End Take c corresponding to the best cluster as the number of sources. ->
57
Elimination of Low Energy Points 11:51 PM57
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.