Download presentation
Presentation is loading. Please wait.
Published byScot Hopkins Modified over 9 years ago
1
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey Single-channel source separation aims to find estimates of source signals that are mixed when a single mixture is available. The observed mixed signal x(t) is a mixture of multi-source signals s z (t). This can be formed in the short time Fourier transform (STFT) domain as This can be approximated as a sum of magnitude spectrograms as The magnitude spectrograms can be written as nonnegative matrices as NONNEGATIVE MATRIX FACTORIZATION NMF is used to decompose a nonnegative matrix V into a low rank nonnegative basis vectors matrix B and a nonnegative weights matrix G. B and G can be found by minimizing the generalized Kullback-Leibler divergence Subject to elements of. The update solutions of B and G are INTRODUCTION NMF FOR SOURCE SEPARATION In training stage: Magnitude spectrogram of each source training data is used to build dictionary B z for each source using NMF. In testing stage: NMF is used to decompose the magnitude spectrogram of the mixed signal X into a nonnegative weighted linear combinations of the trained dictionaries as The initial estimate for each source is found as: SMR dB Just Using Mask Median FilterMoving Average FilterHamming filter a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 2 b = 3 a = 1 b = 2 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 2 b = 3 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 2 b = 3 -57.057.267.447.457.307.047.187.347.387.326.847.177.397.437.426.72 010.3710.6910.8610.8210.7110.4710.5610.7210.7410.5710.1310.5110.7610.8010.7510.01 512.4612.8012.9512.9212.7312.3112.6012.7712.7212.4411.8712.5912.81 12.7011.78 1015.2315.8316.0315.9715.7815.4015.4415.6515.5315.1314.6715.4015.6815.6615.5014.54 1517.0517.8117.9817.9117.7217.5417.3417.5217.3216.8116.5617.2417.5517.5017.2816.43 2018.4019.3719.5619.5819.4119.1118.7418.8718.6318.0717.8718.6018.9118.8418.6017.75 Table 2: Signal to Noise Ratio (SNR) in dB for the speech signal using the smoothed mask We compare enforcing temporal smoothness by using post-smoothed spectral masks with enforcing smoothness by using regularized NMF. The regularized NMF is defined as Where B d = [B speech, B music ], α is the regularization parameter, and R(G) is the continuity prior penalty term defined as: Where In this work, we choose different α s values for speech and α m for music. Table 1 shows the separation results using the regularized NMF to enforce smoothness on the estimated source signals. Tables 2 and 3 show the separation results where the smoothness is enforced using smoothed spectral masks. The tables show that, enforcing smoothness using smoothed masks gives better separation results than enforcing smoothness using regularized NMF. PROBLEM FORMULATION SIGNALS RECONSTRUCTION AND SMOOTHED MASKS The initial estimates are used to build a spectral mask as Changing p leads to different type of mask. The spectral mask can be used to find estimate for each source by element-wise multiplication with the spectrogram of the mixed signal as. To add temporal smoothness to the estimated source signal spectrograms, the spectral mask is smoothed by a 2-D smoothing filter with dimensions (a,b) as The is a smoothing filter, which can be 1.The median filter. 2.The moving average low pass filter. 3.The Hamming windowed moving average filter (Hamming filter). The smoothing direction is the horizontal (time) direction of the spectrograms. The final estimate for each source can be found as SMR dB Median FilterMoving Average FilterHamming filter a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 1 b =11 a = 1 b = 3 a = 1 b = 5 a = 1 b = 7 a = 1 b = 9 a = 1 b =11 a = 1 b =13 -57.167.177.157.567.797.857.827.747.217.607.767.857.887.89 010.4610.4810.4110.9511.1611.1811.1210.9910.5610.9711.1311.2011.2211.20 512.5712.6912.5713.1213.4013.4813.4413.3112.6713.1513.3513.4613.51 1015.5715.5915.5416.1916.4916.5816.5616.4815.5316.2016.4316.5516.6016.61 1517.5717.5417.2918.2518.6018.7318.7518.7017.4418.2618.5318.6818.7618.79 2019.0019.0618.8919.8520.3320.5620.6720.5918.8619.8720.2420.4620.6820.67 Table 3: SNR in dB for the speech signal using the smoothed mask SMR dB Just NMF No mask No prior Regularized NMF α s = 10 -5 α m = 10 -5 α s = 10 -5 α m = 10 -3 -56.176.133.53 09.159.167.37 510.81 10.18 1012.81 14.58 1514.0214.0317.60 2014.6714.6620.37 Table 1: Signal to Noise Ratio (SNR) in dB for the speech signal using regularized NMF The proposed algorithm is used to separate a speech signal from a background piano music signal. For STFT, 512-point FFT, first 257 points are only used, the sampling rate is 16kHz. We train 128 basis vectors for each source dictionary, so the size of each matrix B is 257x128. WE CAN ADD SOMETHING HERE EXPERIMENTS AND RESULTS 1 2 3 4 5 6 7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.