Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and.

Similar presentations


Presentation on theme: "Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and."— Presentation transcript:

1 Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and Prof. V. Rajbabu by Pradeep Gaddipati 08307029 M. Tech. project presentation

2 2Digital Audio Processing LabThursday, June 17 th 2Digital Audio Processing LabThursday, June 17 th Outline  Problem statement  Audio spatialization  Source separation  Data-adaptive TFR  Concentration measure (sparsity)  Re-construction of signal from TFR  Performance evaluation  Data-adaptive TFR for sinusoid detection  Conclusions and future work

3 3Digital Audio Processing LabThursday, June 17 th 3Digital Audio Processing LabThursday, June 17 th Problem statement  Spatial audio – surround sound  commonly used in movies, gaming, etc.  suspended disbelief  applicable when the playback device is located at a considerable distance from the listener  Mobile phones  headphones – for playback  spatial audio – ineffective over headphones  lacks body reflection cues – in-the-head localization  can‘t re-record – so need for audio spatialization

4 4Digital Audio Processing LabThursday, June 17 th 4Digital Audio Processing LabThursday, June 17 th Audio spatialization  Audio spatialization – a spatial rendering technique for conversion of the available audio into desired listening configuration  Analysis – separating individual sources  Re-synthesis – re-creating the desired listener-end configuration Available spatial audio (speakers) Analysis (source separation) separated sources Re-synthesis (convolving with HRIRs) Desired listener-end configuration (headphones)

5 5Digital Audio Processing LabThursday, June 17 th 5Digital Audio Processing LabThursday, June 17 th Source separation  Source separation – obtaining the estimates of the underlying sources, from a set of observations from the sensors  Time-frequency transform  Source analysis – estimation of mixing parameters  Source synthesis – estimation of sources  Inverse time-frequency representation Mixtures (stereo) Time- frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Mixtures (stereo) Source 1 Source 2 Source 3

6 6Digital Audio Processing LabThursday, June 17 th 6Digital Audio Processing LabThursday, June 17 th Mixing model  Anechoic mixing model  mixtures, x i  sources, s j  Under-determined (M < N)  M = number of mixtures  N = number of sources  Mixing parameters  attenuation parameters, a ij  delay parameters, Figure: Anechoic mixing model – Audio is observed at the microphones with differing intensity and arrival times (because of propagation delays) but with no reverberations Source: P. O. Grady, B. Pearlmutter and S. Rickard, “Survey of sparse and non-sparse methods in source separation,” International Journal of Imaging Systems and Technology, 2005.

7 7Digital Audio Processing LabThursday, June 17 th 7Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

8 8Digital Audio Processing LabThursday, June 17 th 8Digital Audio Processing LabThursday, June 17 th Time-frequency transform Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

9 9Digital Audio Processing LabThursday, June 17 th 9Digital Audio Processing LabThursday, June 17 th  Time-frequency representation of mixtures  Requirement for source separation [1]  W-disjoint orthogonality Source analysis (estimation of mixing parameters) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

10 10Digital Audio Processing LabThursday, June 17 th 10Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)  For every time-frequency bin  estimate the mixing parameters [1]  Create a 2-dimensional histogram  peaks indicate the mixing parameters Source analysis (estimation of mixing parameters)

11 11Digital Audio Processing LabThursday, June 17 th 11Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source analysis (estimation of mixing parameters)

12 12Digital Audio Processing LabThursday, June 17 th 12Digital Audio Processing LabThursday, June 17 th Mixture Source 1 Source 2 Source 3 SourcesMasks Source synthesis (estimation of sources) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

13 13Digital Audio Processing LabThursday, June 17 th 13Digital Audio Processing LabThursday, June 17 th Mixture Source 1 Source 2 Source 3 Source synthesis (estimation of sources)

14 14Digital Audio Processing LabThursday, June 17 th 14Digital Audio Processing LabThursday, June 17 th  Source estimation techniques  degenerate unmixing technique (DUET) [1]  lq-basis pursuit (LQBP) [2]  delay and scale subtraction scoring (DASSS) [3] Source synthesis (estimation of sources) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

15 15Digital Audio Processing LabThursday, June 17 th 15Digital Audio Processing LabThursday, June 17 th Source synthesis (DUET)  Every time-frequency bin of the mixture is assigned to one of the source based on the distance measure Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

16 16Digital Audio Processing LabThursday, June 17 th 16Digital Audio Processing LabThursday, June 17 th Source synthesis (LQBP)  Relaxes the assumption of WDO – assumes at most ‘M’ sources present at each T-F bin  M = no. of mixtures, N = no. of sources, (M < N)  l q measure decides which ‘M’ sources are present Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

17 17Digital Audio Processing LabThursday, June 17 th 17Digital Audio Processing LabThursday, June 17 th Source synthesis (DASSS)  Identifies which bins have only one dominant source  uses DUET for that bins  assumes at most ‘M’ sources present in rest of the bins  error threshold decides which ‘M’ sources are present Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

18 18Digital Audio Processing LabThursday, June 17 th 18Digital Audio Processing LabThursday, June 17 th Inverse time-frequency transform Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Est. source 1 Est. source 2 Est. source 3 Orig. source 1 Orig. source 2 Orig. source 3 Mixtures (stereo)

19 19Digital Audio Processing LabThursday, June 17 th 19Digital Audio Processing LabThursday, June 17 th Scope for improvement  Requirement for source separation  W-disjoint orthogonality (WDO) amongst the sources  Sparser the TFR of the mixtures [4]  the less will be the overlap amongst the sources (i.e. higher WDO)  easier will be their separation

20 20Digital Audio Processing LabThursday, June 17 th 20Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR  For music/speech signals  different components (harmonic/transients/modulations) at different time-instants  best window differs for different components  this suggests use of data-dependent time-varying window function to achieve a high sparsity [6]  To obtain sparser TFR of mixture  use different analysis window lengths for different time- instants, the one which gives maximum sparsity Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

21 21Digital Audio Processing LabThursday, June 17 th 21Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR Data-adaptive time-frequency representation of singing voice, window function = hamming window sizes = 30, 60 and 90 ms, hop size = 10 ms, conc. measure = kurtosis

22 22Digital Audio Processing LabThursday, June 17 th 22Digital Audio Processing LabThursday, June 17 th Sparsity measure (concentration measure)  What is sparsity ?  small number of coefficients contain a large proportion of the energy  Common sparsity measures [5]  Kurtosis  Gini Index  Which sparsity measure to use for adaptation ?  the one which shows the same trend as WDO as a function of analysis window size

23 23Digital Audio Processing LabThursday, June 17 th 23Digital Audio Processing LabThursday, June 17 th WDO and sparsity (some formulae)  W-disjoint orthogonality [4]  Kurtosis  Gini Index

24 24Digital Audio Processing LabThursday, June 17 th 24Digital Audio Processing LabThursday, June 17 th Dataset description  Dataset : BSS oracle  Sampling frequency : 22050 Hz  10 sets each of music and speech signals  One set : 3 signals  Duration : 11 seconds

25 25Digital Audio Processing LabThursday, June 17 th 25Digital Audio Processing LabThursday, June 17 th WDO and sparsity  WDO vs. window size  obtain TFR of the sources in a set  obtain source-masks based on the magnitude of the TFRs in each of the T-F bins  using the source-masks and the TFR of the sources obtain the WDO measure  NOTE: In case of data-adaptive TFR, obtain the TFR of sources using the window sequence obtained from the adaptation of the mixture  Sparsity vs. window size  obtain the TFR of one of the channel of the source  calculate the frame-wise sparsity of the TFR of the mixture

26 26Digital Audio Processing LabThursday, June 17 th 26Digital Audio Processing LabThursday, June 17 th WDO vs. window size

27 27Digital Audio Processing LabThursday, June 17 th 27Digital Audio Processing LabThursday, June 17 th Kurtosis vs. window size

28 28Digital Audio Processing LabThursday, June 17 th 28Digital Audio Processing LabThursday, June 17 th Gini Index vs. window size

29 29Digital Audio Processing LabThursday, June 17 th 29Digital Audio Processing LabThursday, June 17 th WDO and sparsity (observations)  Highest sparsity (kurtosis/Gini Index) is obtained when data-adaptive TFR is used  Highest WDO is obtained by using data-adaptive TFR (with kurtosis as the adaptation)  Kurtosis is observed to have similar trend as that of WDO

30 30Digital Audio Processing LabThursday, June 17 th 30Digital Audio Processing LabThursday, June 17 th  Constraint (introduced by source separation)  TFR should be invertible  Solution  Select analysis windows such that they satisfy constant over-lap add (COLA) criterion [7]  Techniques  transition window  modified (extended) window Inverse data-adaptive TFR Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

31 31Digital Audio Processing LabThursday, June 17 th 31Digital Audio Processing LabThursday, June 17 th Transition window technique

32 32Digital Audio Processing LabThursday, June 17 th 32Digital Audio Processing LabThursday, June 17 th Modified window technique

33 33Digital Audio Processing LabThursday, June 17 th 33Digital Audio Processing LabThursday, June 17 th Problems with re-construction  Transition window technique  adaptation carried out only on alternate frames  WDO obtained amongst the underlying sources is less  Modified window technique  the extended window as compared to a normal hamming window has larger side-lobes  spreading the signal energy into neighboring bins  WDO measure decreases

34 34Digital Audio Processing LabThursday, June 17 th 34Digital Audio Processing LabThursday, June 17 th Dataset description  Dataset – BSS oracle  Mixtures per set (72 = 24 x 3)  attenuation parameters (24 = 4 P 3 ) {10 0, 30 0, 60 0, 80 0 }  Delay parameters {(0,0,0), (0, 1, 2), (0 2 1)}  A total of 720 (72 x 10) mixtures (test cases) for each of music and speech groups

35 35Digital Audio Processing LabThursday, June 17 th 35Digital Audio Processing LabThursday, June 17 th Performance (mixing parameters) TFR (window size) hop size = 10ms Cases with correct estimation of sources (%) Error in estimation of mixing parameters Attenuation parameters (degrees) Delay Parameter (no. of samples) STFT (30 ms)67.992.480.39 STFT (60 ms)74.791.670.31 STFT (90 ms)74.931.480.30 ATFR (30, 60, 90 ms)79.510.790.25

36 36Digital Audio Processing LabThursday, June 17 th 36Digital Audio Processing LabThursday, June 17 th Performance (source estimation)  Evaluate the source-masks using one of the source estimation techniques (DUET or LQBP)  Using the set of estimated source-masks and the TFRs of the original sources calculate the WDO measure of each of the source-masks  WDO measure indicates how well the mask  preserves the source of interest  suppresses the interfering sources

37 37Digital Audio Processing LabThursday, June 17 th 37Digital Audio Processing LabThursday, June 17 th Performance (source estimation) TFR (window size) hop size = 10ms WDO measure DUETLQBP STFT (30 ms)0.81610.6218 STFT (60 ms)0.85580.6350 STFT (90 ms)0.85820.6356 ATFR (30, 60, 90 ms)0.86120.6362

38 38Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR (for sinusoid detection) Data-adaptive time-frequency representation of a singing voice window function = hamming; window sizes = 20, 40 and 60 ms; hop size = 10 ms, concentration measure = kurtosis; frequency range = 1000 to 3000 Hz

39 39Digital Audio Processing LabThursday, June 17 th 39Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR (for sinusoid detection) TFR(window size) hop size = 10ms True hits (%) 0 – 1500 Hz1000 – 3000 Hz2500 – 5000 Hz STFT (20 ms)91.2985.3376.98 STFT (40 ms)95.6782.1668.16 STFT (60 ms)86.7868.2464.95 ATFR (20, 40, 60 ms)96.0986.0982.53

40 40Digital Audio Processing LabThursday, June 17 th 40Digital Audio Processing LabThursday, June 17 th Conclusions  Mixing model – anechoic  Kurtosis can be used as the adaptation criterion for data-adaptive TFR  Data-adaptive TFR provides higher WDO measure amongst the underlying sources as compared to fixed-window STFT  Better estimates of the mixing parameters and the sources are obtained using data-adaptive TFR  Performance of DUET is better than LQBP

41 41Digital Audio Processing LabThursday, June 17 th 41Digital Audio Processing LabThursday, June 17 th Future work  Testing of the DASSS source estimation technique  Re-construction of the signal from TFR  Need to consider a more realistic mixing model to account for reverberation effects, like echoic mixing model

42 42Digital Audio Processing LabThursday, June 17 th 42Digital Audio Processing LabThursday, June 17 th Acknowledgments I would like to thank Nokia, India for providing financial support and technical inputs for the work reported here

43 43Digital Audio Processing LabThursday, June 17 th 43Digital Audio Processing LabThursday, June 17 th References 1.A. Jourjine, S. Rickard and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,” IEEE Conference on Acoustics, Speech and Signal Processing, 2000 2.R. Saab, O. Yilmaz, M. J. Mckeown and R. Abugharbieh, “Underdetermined anechoic blind source separation via l q basis pursuit with q<1,” IEEE Transactions on Signal Processing, 2007 3.A. S. Master, “Bayesian two source modelling for separation of N sources from stereo signal,” IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 281-284, 2004

44 44Digital Audio Processing LabThursday, June 17 th 44Digital Audio Processing LabThursday, June 17 th References 4.S. Rickard, “Sparse sources are separated sources,” European Signal Processing Conference, 2006 5.N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Transactions on Information Theory, 2009 6.D. L. Jones and T. Parks, “A high resolution data-adaptive time-frequency representation,” IEEE Transactions on Acoustics, Speech and Signal Processing, 1990 7.P. Basu, P. J. Wolfe, D. Rudoy, T. F. Quatieri and B. Dunn, “Adaptive short- time analysis-synthesis for speech enhancement,” IEEE Conference on Acoustics, Speech and Signal Processing, 2008

45 Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Thank you Questions ?


Download ppt "Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and."

Similar presentations


Ads by Google