Download presentation
Presentation is loading. Please wait.
Published byCatherine Gordon Modified over 9 years ago
2
May 3 rd, 2010 Update
3
Outline Monday, May 3 rd 2 Audio spatialization Performance evaluation (source separation) Source separation System overview Demonstration (system) Concentration measure and W-disjoint orthogonality Adaptive time-frequency representation (TFR) Demonstration (adaptive TFR)
4
Audio spatialization Monday, May 3 rd 3 Audio spatialization – a spatial rendering technique for conversion of the available audio into desired listening configuration Analysis – separating individual sources Re-synthesis – re-creating the desired listener-end configuration Available spatial audio (speakers) Analysis (source separation) separated sources Re-synthesis (convolving with HRIRs) Desired listener-end configuration (headphones)
5
Performance evaluation [1] Monday, May 3 rd 4 Estimated source and Original source Performance evaluation block Performance measures (ISR, SIR, SAR, SDR) ISR = Image to Spatial-distortion Ratio SIR = Source to Interference Ratio SAR = Source to Artifacts Ratio SDR = Source to Distortion Ratio
6
Performance evaluation Monday, May 3 rd 5 Estimated source image can be decomposed as true source image, error components spatial distortion, interference, artifacts,
7
Performance evaluation Monday, May 3 rd 6
8
Source separation [2,3] Monday, May 3 rd 7 Mixtures (stereo) Time- frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source separation – obtaining the estimates of the underlying sources, from a set of observations from the sensors Time-frequency transform Source analysis – estimation of mixing parameters Source synthesis – estimation of sources Inverse time-frequency representation
9
Mixing model Monday, May 3 rd 8 Anechoic mixing model Mixtures, x i Sources, s j Under-determined (M < N) M = Number of mixtures N = Number of sources Figure: Anechoic mixing model – Audio is observed at the microphones with differing intensity and arrival times (because of propagation delays) but with no reverberations Source:P. O. Grady, B. Pearlmutter and S. Rickard, “Survey of sparse and non-sparse methods in source separation,” International Journal of Imaging Systems and Technology, 2005
10
Mixtures Monday, May 3 rd 9 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source 1 Source 2Source 3 Mixtures (stereo)
11
function – TFRStereo Mixture (stereo) Sampling frequency DFT size Window size Hop size Mixture TFRs InputsOutputs Monday, May 3 rd 10 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
12
Time-frequency transform Monday, May 3 rd 11 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
13
function – SourceAnalysis Mixture TFRs 2-D histogram Mixing parameters InputsOutputs Monday, May 3 rd 12 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
14
Source analysis (estimation of mixing parameters) Monday, May 3 rd 13 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
15
function – SourceSynthesis Mixing parameters Mixture TFRs Estimation technique DUET/LQBP Estimated source masks Estimated source TFRs InputsOutputs Monday, May 3 rd 14 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
16
Source synthesis (estimation of sources) Monday, May 3 rd 15 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
17
Monday, May 3 rd 16 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source synthesis (estimation of sources)
18
Monday, May 3 rd 17 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source synthesis (estimation of sources)
19
function – InverseTFR Estimated source TFRs Sampling frequency Estimated sources InputsOutputs Monday, May 3 rd 18 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
20
Inverse time-frequency transform Monday, May 3 rd 19 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Orig. source 1 Orig. source 2 Orig. source 3 Source 1 Source 2 Source 3
21
Demonstration (system) Monday, May 3 rd 20 No. of sources (2)No. of sources (3) Mixture Original SAR15.558.1713.719.0913.23 SDR15.107.6413.118.6211.61 SIR25.7119.8322.0621.2023.65 ISR27.6519.7624.3720.4718.48 SAR51.4144.288.493.887.34 SDR51.3344.217.773.375.58 SIR69.4569.1513.666.8410.66 ISR76.2862.3217.0310.7324.62 DFT size = 2048 Window size = 50 ms Hop size = 25 ms Sampling frequency = 22050 Hz all the values are in dB
22
Concentration measure Monday, May 3 rd 21 Requirement for source separation W-disjoint orthogonality Sparsity is an indicator of WDO [4] Thus a sparser TFR is expected to satisfy WDO criterion to a greater extent Commonly used sparsity measures [5] Kurtosis Gini Index
23
Monday, May 3 rd 22 Source separation demands (WDO) Sparse time-frequency representation (TFR) Some observations Music/speech signals – different frequency components present at different time instants Different analysis window lengths provide different sparsity [4] Therefore, to obtain a sparser TFR Use that analysis window length for a particular time-instant, which gives highest sparsity [6] Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Adaptive TFR
24
Monday, May 3 rd 23
25
Adaptive TFR Monday, May 3 rd 24
26
function – TFRStereo (modified) Mixture (stereo) Sampling frequency DFT size Window size Window size default Concentration measure Mixture TFRs Adapted window sequence InputsOutputs Monday, May 3 rd 25 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
27
Monday, May 3 rd 26 Constraint TFR should be invertible Solution Select analysis windows such that they satisfy constant over-lap add (COLA) criterion [7] Inverse adaptive TFR Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
28
Analysis windows (COLA) Monday, May 3 rd 27
29
function – InverseTFR (modified) Estimated source TFRs Sampling frequency Adapted window sequence Window size default Estimated sources InputsOutputs Monday, May 3 rd 28 Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)
30
Demonstration (adaptive TFR) Monday, May 3 rd 29 Source 1Source 2Source 3 Original ATFR (20:10:90 ms) SAR16.703.759.43 SDR14.192.837.04 SIR21.3510.9411.20 ISR20.058.6617.65 TFR (60 ms) SAR15.813.208.66 SDR13.602.466.25 SIR22.6211.7810.61 ISR19.629.5419.24 all the values are in dB
31
Demonstration (adaptive TFR) Monday, May 3 rd 30 Source 1Source 2Source 3 Original ATFR (20:10:90 ms) SAR12.308.903.68 SDR11.808.784.32 SIR22.7819.3413.47 ISR18.5318.2211.51 TFR (60 ms) SAR12.138.793.18 SDR11.768.693.76 SIR22.7618.9216.24 ISR19.5516.3212.16 all the values are in dB
32
References Monday, May 3 rd 31 1.E. Vincent, R. Gribonval and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, 2006 2.A. Jourjine, S. Rickard and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,” IEEE Conference on Acoustics, Speech and Signal Processing, 2000 3.R. Saab, O. Yilmaz, M. J. Mckeown and R. Abugharbieh, “Underdetermined anechoic blind source separation via l q basis pursuit with q<1,” IEEE Transactions on Signal Processing, 2007
33
References Monday, May 3 rd 32 4.S. Rickard, “Sparse sources are separated sources,” European Signal Processing Conference, 2006 5.N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Transactions on Information Theory, 2009 6.D. L. Jones and T. Parks, “A high resolution data-adaptive time-frequency representation,” IEEE Transactions on Acoustics, Speech and Signal Processing, 1990 7.P. Basu, P. J. Wolfe, D. Rudoy, T. F. Quatieri and B. Dunn, “Adaptive short- time analysis-synthesis for speech enhancement,” IEEE Conference on Acoustics, Speech and Signal Processing, 2008
34
Questions ? Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.