Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.

Similar presentations


Presentation on theme: "SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel."— Presentation transcript:

1 SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel source separation aims to find estimates of source signals that are mixed when a single mixture is available.  The observed mixed signal x(t) is a mixture of multi-source signals s z (t).  This can be formed in the short time Fourier transform (STFT) domain as  This can be approximated as a sum of magnitude spectrograms as  The magnitude spectrograms can be written as nonnegative matrices as NONNEGATIVE MATRIX FACTORIZATION  NMF is used to decompose a nonnegative matrix V into a low rank nonnegative basis vectors matrix B and a nonnegative weights matrix G.  B and G can be found by minimizing the generalized Kullback-Leibler divergence Subject to elements of.  The update solutions of B and G are INTRODUCTION NMF FOR SOURCE SEPARATION In training stage:  Magnitude spectrogram of each source training data is used to build dictionary B z for each source using NMF. In testing stage:  NMF is used to decompose the magnitude spectrogram of the mixed signal X into a nonnegative weighted linear combinations of the trained dictionaries as  The initial estimate for each source is found as: SMR dB Just Using Mask Median FilterMoving Average FilterHamming filter a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 2 b = 3 a = 1 b = 2 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 2 b = 3 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 2 b = 3 -57.057.267.447.457.307.047.187.347.387.326.847.177.397.437.426.72 010.3710.6910.8610.8210.7110.4710.5610.7210.7410.5710.1310.5110.7610.8010.7510.01 512.4612.8012.9512.9212.7312.3112.6012.7712.7212.4411.8712.5912.81 12.7011.78 1015.2315.8316.0315.9715.7815.4015.4415.6515.5315.1314.6715.4015.6815.6615.5014.54 1517.0517.8117.9817.9117.7217.5417.3417.5217.3216.8116.5617.2417.5517.5017.2816.43 2018.4019.3719.5619.5819.4119.1118.7418.8718.6318.0717.8718.6018.9118.8418.6017.75 Table 2: Signal to Noise Ratio (SNR) in dB for the speech signal using the smoothed mask  We compare enforcing temporal smoothness by using post-smoothed spectral masks with enforcing smoothness by using regularized NMF.  The regularized NMF is defined as Where B d = [B speech, B music ], α is the regularization parameter, and R(G) is the continuity prior penalty term defined as: Where  In this work, we choose different α s values for speech and α m for music.  Table 1 shows the separation results using the regularized NMF to enforce smoothness on the estimated source signals.  Tables 2 and 3 show the separation results where the smoothness is enforced using smoothed spectral masks.  The tables show that, enforcing smoothness using smoothed masks gives better separation results than enforcing smoothness using regularized NMF. PROBLEM FORMULATION SIGNALS RECONSTRUCTION AND SMOOTHED MASKS  The initial estimates are used to build a spectral mask as  Changing p leads to different type of mask.  The spectral mask can be used to find estimate for each source by element-wise multiplication with the spectrogram of the mixed signal as.  To add temporal smoothness to the estimated source signal spectrograms, the spectral mask is smoothed by a 2-D smoothing filter with dimensions (a,b) as  The is a smoothing filter, which can be 1.The median filter. 2.The moving average low pass filter. 3.The Hamming windowed moving average filter (Hamming filter).  The smoothing direction is the horizontal (time) direction of the spectrograms.  The final estimate for each source can be found as SMR dB Median FilterMoving Average FilterHamming filter a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 1 b =11 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 1 b =11 a = 1 b =13 -57.167.177.157.567.797.857.827.747.217.607.767.857.887.89 010.4610.4810.4110.9511.1611.1811.1210.9910.5610.9711.1311.2011.2211.20 512.5712.6912.5713.1213.4013.4813.4413.3112.6713.1513.3513.4613.51 1015.5715.5915.5416.1916.4916.5816.5616.4815.5316.2016.4316.5516.6016.61 1517.5717.5417.2918.2518.6018.7318.7518.7017.4418.2618.5318.6818.7618.79 2019.0019.0618.8919.8520.3320.5620.6720.5918.8619.8720.2420.4620.6820.67 Table 3: SNR in dB for the speech signal using the smoothed mask SMR dB Just NMF No mask No prior Regularized NMF α s = 10 -5 α m = 10 -5 α s = 10 -5 α m = 10 -3 -56.176.133.53 09.159.167.37 510.81 10.18 1012.81 14.58 1514.0214.0317.60 2014.6714.6620.37 Table 1: Signal to Noise Ratio (SNR) in dB for the speech signal using regularized NMF  The proposed algorithm is used to separate a speech signal from a background piano music signal.  For STFT, 512-point FFT, first 257 points are only used, the sampling rate is 16kHz.  We train 128 basis vectors for each source dictionary, so the size of each matrix B is 257x128.  WE CAN ADD SOMETHING HERE EXPERIMENTS AND RESULTS 1 2 3 4 5 6 7


Download ppt "SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel."

Similar presentations


Ads by Google