Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Speech Enhancement through Noise Reduction By Yating & Kundan.

Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.

Multichannel Phonocardiogram Source Separation PGBIOMED University of Reading 20 th July 2005 Conor Fearon and Scott Rickard University College Dublin.

On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.

Short Time Fourier Transform (STFT)

Effects in frequency domain Stefania Serafin Music Informatics Fall 2004.

Project Presentation: March 9, 2006

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Zhengyou Zhang, Qin Cai, Jay Stokes

Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015.

Submission doc.: IEEE /1452r0 November 2014 Leif Wilhelmsson, EricssonSlide 1 Frequency selective scheduling in OFDMA Date: Authors:

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

AGC DSP AGC DSP Professor A G Constantinides 1 Digital Filter Specifications Only the magnitude approximation problem Four basic types of ideal filters.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.

Detection and Segmentation of Bird Song in Noisy Environments

What’s Making That Sound ?

Multiresolution STFT for Analysis and Processing of Audio

Speech Enhancement Using Spectral Subtraction

Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.

May 3 rd, 2010 Update Outline Monday, May 3 rd 2  Audio spatialization  Performance evaluation (source separation)  Source separation  System overview.

STRUCTURED SPARSE ACOUSTIC MODELING FOR SPEECH SEPARATION AFSANEH ASAEI JOINT WORK WITH: MOHAMMAD GOLBABAEE, HERVE BOURLARD, VOLKAN CEVHER.

SCALE Speech Communication with Adaptive LEarning Computational Methods for Structured Sparse Component Analysis of Convolutive Speech Mixtures Volkan.

Digital Sound Actual representation of sound Stored in form of thousands of individual numbers (called samples) Not device dependent Stored in bits.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.

P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,

Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.

A Study of Sparse Non-negative Matrix Factor 2-D Deconvolution Combined With Mask Application for Blind Source Separation of Frog Species 1 Reporter ：

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

Independent Component Analysis Independent Component Analysis.

MINUET Musical Interference Unmixing Estimation Technique Scott Rickard, Conor Fearon Department of Electronic & Electrical Engineering University College.

Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)

Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,

Siemens Corporate Research Rosca et al. – Generalized Sparse Mixing Model & BSS – ICASSP, Montreal 2004 Generalized Sparse Signal Mixing Model and Application.

Benedikt Loesch and Bin Yang University of Stuttgart Chair of System Theory and Signal Processing International Workshop on Acoustic Echo and Noise Control,

Image Contrast Enhancement Based on a Histogram Transformation of Local Standard Deviation Dah-Chung Chang* and Wen-Rong Wu, Member, IEEE IEEE TRANSACTIONS.

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.

Professor A G Constantinides 1 Digital Filter Specifications We discuss in this course only the magnitude approximation problem There are four basic types.

Speech Enhancement Summer 2009

CS 591 S1 – Computational Audio -- Spring, 2017

Approaches of Interest in Blind Source Separation of Speech

Introduction to Audio Watermarking Schemes N. Lazic and P

Stereo Mix Source Identification and Separation

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

Robust Data Hiding for MCLT Based Acoustic Data Transmission

III Digital Audio III.9 (Wed Oct 25) Phase vocoder for tempo and pitch changes.

III Digital Audio III.9 (Wed Oct 24) Phase vocoder for tempo and pitch changes.

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing

INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)

Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.

Lec.6:Discrete Fourier Transform and Signal Spectrum

Emad M. Grais Hakan Erdogan

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and Prof. V. Rajbabu by Pradeep Gaddipati M. Tech. project presentation

2Digital Audio Processing LabThursday, June 17 th 2Digital Audio Processing LabThursday, June 17 th Outline  Problem statement  Audio spatialization  Source separation  Data-adaptive TFR  Concentration measure (sparsity)  Re-construction of signal from TFR  Performance evaluation  Data-adaptive TFR for sinusoid detection  Conclusions and future work

3Digital Audio Processing LabThursday, June 17 th 3Digital Audio Processing LabThursday, June 17 th Problem statement  Spatial audio – surround sound  commonly used in movies, gaming, etc.  suspended disbelief  applicable when the playback device is located at a considerable distance from the listener  Mobile phones  headphones – for playback  spatial audio – ineffective over headphones  lacks body reflection cues – in-the-head localization  can‘t re-record – so need for audio spatialization

4Digital Audio Processing LabThursday, June 17 th 4Digital Audio Processing LabThursday, June 17 th Audio spatialization  Audio spatialization – a spatial rendering technique for conversion of the available audio into desired listening configuration  Analysis – separating individual sources  Re-synthesis – re-creating the desired listener-end configuration Available spatial audio (speakers) Analysis (source separation) separated sources Re-synthesis (convolving with HRIRs) Desired listener-end configuration (headphones)

5Digital Audio Processing LabThursday, June 17 th 5Digital Audio Processing LabThursday, June 17 th Source separation  Source separation – obtaining the estimates of the underlying sources, from a set of observations from the sensors  Time-frequency transform  Source analysis – estimation of mixing parameters  Source synthesis – estimation of sources  Inverse time-frequency representation Mixtures (stereo) Time- frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Mixtures (stereo) Source 1 Source 2 Source 3

6Digital Audio Processing LabThursday, June 17 th 6Digital Audio Processing LabThursday, June 17 th Mixing model  Anechoic mixing model  mixtures, x i  sources, s j  Under-determined (M < N)  M = number of mixtures  N = number of sources  Mixing parameters  attenuation parameters, a ij  delay parameters, Figure: Anechoic mixing model – Audio is observed at the microphones with differing intensity and arrival times (because of propagation delays) but with no reverberations Source: P. O. Grady, B. Pearlmutter and S. Rickard, “Survey of sparse and non-sparse methods in source separation,” International Journal of Imaging Systems and Technology, 2005.

7Digital Audio Processing LabThursday, June 17 th 7Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

8Digital Audio Processing LabThursday, June 17 th 8Digital Audio Processing LabThursday, June 17 th Time-frequency transform Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

9Digital Audio Processing LabThursday, June 17 th 9Digital Audio Processing LabThursday, June 17 th  Time-frequency representation of mixtures  Requirement for source separation [1]  W-disjoint orthogonality Source analysis (estimation of mixing parameters) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

10Digital Audio Processing LabThursday, June 17 th 10Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)  For every time-frequency bin  estimate the mixing parameters [1]  Create a 2-dimensional histogram  peaks indicate the mixing parameters Source analysis (estimation of mixing parameters)

11Digital Audio Processing LabThursday, June 17 th 11Digital Audio Processing LabThursday, June 17 th Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Source analysis (estimation of mixing parameters)

12Digital Audio Processing LabThursday, June 17 th 12Digital Audio Processing LabThursday, June 17 th Mixture Source 1 Source 2 Source 3 SourcesMasks Source synthesis (estimation of sources) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

13Digital Audio Processing LabThursday, June 17 th 13Digital Audio Processing LabThursday, June 17 th Mixture Source 1 Source 2 Source 3 Source synthesis (estimation of sources)

14Digital Audio Processing LabThursday, June 17 th 14Digital Audio Processing LabThursday, June 17 th  Source estimation techniques  degenerate unmixing technique (DUET) [1]  lq-basis pursuit (LQBP) [2]  delay and scale subtraction scoring (DASSS) [3] Source synthesis (estimation of sources) Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

15Digital Audio Processing LabThursday, June 17 th 15Digital Audio Processing LabThursday, June 17 th Source synthesis (DUET)  Every time-frequency bin of the mixture is assigned to one of the source based on the distance measure Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

16Digital Audio Processing LabThursday, June 17 th 16Digital Audio Processing LabThursday, June 17 th Source synthesis (LQBP)  Relaxes the assumption of WDO – assumes at most ‘M’ sources present at each T-F bin  M = no. of mixtures, N = no. of sources, (M < N)  l q measure decides which ‘M’ sources are present Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

17Digital Audio Processing LabThursday, June 17 th 17Digital Audio Processing LabThursday, June 17 th Source synthesis (DASSS)  Identifies which bins have only one dominant source  uses DUET for that bins  assumes at most ‘M’ sources present in rest of the bins  error threshold decides which ‘M’ sources are present Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

18Digital Audio Processing LabThursday, June 17 th 18Digital Audio Processing LabThursday, June 17 th Inverse time-frequency transform Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2) Est. source 1 Est. source 2 Est. source 3 Orig. source 1 Orig. source 2 Orig. source 3 Mixtures (stereo)

19Digital Audio Processing LabThursday, June 17 th 19Digital Audio Processing LabThursday, June 17 th Scope for improvement  Requirement for source separation  W-disjoint orthogonality (WDO) amongst the sources  Sparser the TFR of the mixtures [4]  the less will be the overlap amongst the sources (i.e. higher WDO)  easier will be their separation

20Digital Audio Processing LabThursday, June 17 th 20Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR  For music/speech signals  different components (harmonic/transients/modulations) at different time-instants  best window differs for different components  this suggests use of data-dependent time-varying window function to achieve a high sparsity [6]  To obtain sparser TFR of mixture  use different analysis window lengths for different time- instants, the one which gives maximum sparsity Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

21Digital Audio Processing LabThursday, June 17 th 21Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR Data-adaptive time-frequency representation of singing voice, window function = hamming window sizes = 30, 60 and 90 ms, hop size = 10 ms, conc. measure = kurtosis

22Digital Audio Processing LabThursday, June 17 th 22Digital Audio Processing LabThursday, June 17 th Sparsity measure (concentration measure)  What is sparsity ?  small number of coefficients contain a large proportion of the energy  Common sparsity measures [5]  Kurtosis  Gini Index  Which sparsity measure to use for adaptation ?  the one which shows the same trend as WDO as a function of analysis window size

23Digital Audio Processing LabThursday, June 17 th 23Digital Audio Processing LabThursday, June 17 th WDO and sparsity (some formulae)  W-disjoint orthogonality [4]  Kurtosis  Gini Index

24Digital Audio Processing LabThursday, June 17 th 24Digital Audio Processing LabThursday, June 17 th Dataset description  Dataset : BSS oracle  Sampling frequency : Hz  10 sets each of music and speech signals  One set : 3 signals  Duration : 11 seconds

25Digital Audio Processing LabThursday, June 17 th 25Digital Audio Processing LabThursday, June 17 th WDO and sparsity  WDO vs. window size  obtain TFR of the sources in a set  obtain source-masks based on the magnitude of the TFRs in each of the T-F bins  using the source-masks and the TFR of the sources obtain the WDO measure  NOTE: In case of data-adaptive TFR, obtain the TFR of sources using the window sequence obtained from the adaptation of the mixture  Sparsity vs. window size  obtain the TFR of one of the channel of the source  calculate the frame-wise sparsity of the TFR of the mixture

26Digital Audio Processing LabThursday, June 17 th 26Digital Audio Processing LabThursday, June 17 th WDO vs. window size

27Digital Audio Processing LabThursday, June 17 th 27Digital Audio Processing LabThursday, June 17 th Kurtosis vs. window size

28Digital Audio Processing LabThursday, June 17 th 28Digital Audio Processing LabThursday, June 17 th Gini Index vs. window size

29Digital Audio Processing LabThursday, June 17 th 29Digital Audio Processing LabThursday, June 17 th WDO and sparsity (observations)  Highest sparsity (kurtosis/Gini Index) is obtained when data-adaptive TFR is used  Highest WDO is obtained by using data-adaptive TFR (with kurtosis as the adaptation)  Kurtosis is observed to have similar trend as that of WDO

30Digital Audio Processing LabThursday, June 17 th 30Digital Audio Processing LabThursday, June 17 th  Constraint (introduced by source separation)  TFR should be invertible  Solution  Select analysis windows such that they satisfy constant over-lap add (COLA) criterion [7]  Techniques  transition window  modified (extended) window Inverse data-adaptive TFR Mixtures (stereo) Time-frequency transform Source analysis Source synthesis Inverse time-frequency transform Separated sources (>=2)

31Digital Audio Processing LabThursday, June 17 th 31Digital Audio Processing LabThursday, June 17 th Transition window technique

32Digital Audio Processing LabThursday, June 17 th 32Digital Audio Processing LabThursday, June 17 th Modified window technique

33Digital Audio Processing LabThursday, June 17 th 33Digital Audio Processing LabThursday, June 17 th Problems with re-construction  Transition window technique  adaptation carried out only on alternate frames  WDO obtained amongst the underlying sources is less  Modified window technique  the extended window as compared to a normal hamming window has larger side-lobes  spreading the signal energy into neighboring bins  WDO measure decreases

34Digital Audio Processing LabThursday, June 17 th 34Digital Audio Processing LabThursday, June 17 th Dataset description  Dataset – BSS oracle  Mixtures per set (72 = 24 x 3)  attenuation parameters (24 = 4 P 3 ) {10 0, 30 0, 60 0, 80 0 }  Delay parameters {(0,0,0), (0, 1, 2), (0 2 1)}  A total of 720 (72 x 10) mixtures (test cases) for each of music and speech groups

35Digital Audio Processing LabThursday, June 17 th 35Digital Audio Processing LabThursday, June 17 th Performance (mixing parameters) TFR (window size) hop size = 10ms Cases with correct estimation of sources (%) Error in estimation of mixing parameters Attenuation parameters (degrees) Delay Parameter (no. of samples) STFT (30 ms) STFT (60 ms) STFT (90 ms) ATFR (30, 60, 90 ms)

36Digital Audio Processing LabThursday, June 17 th 36Digital Audio Processing LabThursday, June 17 th Performance (source estimation)  Evaluate the source-masks using one of the source estimation techniques (DUET or LQBP)  Using the set of estimated source-masks and the TFRs of the original sources calculate the WDO measure of each of the source-masks  WDO measure indicates how well the mask  preserves the source of interest  suppresses the interfering sources

37Digital Audio Processing LabThursday, June 17 th 37Digital Audio Processing LabThursday, June 17 th Performance (source estimation) TFR (window size) hop size = 10ms WDO measure DUETLQBP STFT (30 ms) STFT (60 ms) STFT (90 ms) ATFR (30, 60, 90 ms)

38Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR (for sinusoid detection) Data-adaptive time-frequency representation of a singing voice window function = hamming; window sizes = 20, 40 and 60 ms; hop size = 10 ms, concentration measure = kurtosis; frequency range = 1000 to 3000 Hz

39Digital Audio Processing LabThursday, June 17 th 39Digital Audio Processing LabThursday, June 17 th Data-adaptive TFR (for sinusoid detection) TFR(window size) hop size = 10ms True hits (%) 0 – 1500 Hz1000 – 3000 Hz2500 – 5000 Hz STFT (20 ms) STFT (40 ms) STFT (60 ms) ATFR (20, 40, 60 ms)

40Digital Audio Processing LabThursday, June 17 th 40Digital Audio Processing LabThursday, June 17 th Conclusions  Mixing model – anechoic  Kurtosis can be used as the adaptation criterion for data-adaptive TFR  Data-adaptive TFR provides higher WDO measure amongst the underlying sources as compared to fixed-window STFT  Better estimates of the mixing parameters and the sources are obtained using data-adaptive TFR  Performance of DUET is better than LQBP

41Digital Audio Processing LabThursday, June 17 th 41Digital Audio Processing LabThursday, June 17 th Future work  Testing of the DASSS source estimation technique  Re-construction of the signal from TFR  Need to consider a more realistic mixing model to account for reverberation effects, like echoic mixing model

42Digital Audio Processing LabThursday, June 17 th 42Digital Audio Processing LabThursday, June 17 th Acknowledgments I would like to thank Nokia, India for providing financial support and technical inputs for the work reported here

43Digital Audio Processing LabThursday, June 17 th 43Digital Audio Processing LabThursday, June 17 th References 1.A. Jourjine, S. Rickard and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,” IEEE Conference on Acoustics, Speech and Signal Processing, R. Saab, O. Yilmaz, M. J. Mckeown and R. Abugharbieh, “Underdetermined anechoic blind source separation via l q basis pursuit with q<1,” IEEE Transactions on Signal Processing, A. S. Master, “Bayesian two source modelling for separation of N sources from stereo signal,” IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp , 2004

44Digital Audio Processing LabThursday, June 17 th 44Digital Audio Processing LabThursday, June 17 th References 4.S. Rickard, “Sparse sources are separated sources,” European Signal Processing Conference, N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Transactions on Information Theory, D. L. Jones and T. Parks, “A high resolution data-adaptive time-frequency representation,” IEEE Transactions on Acoustics, Speech and Signal Processing, P. Basu, P. J. Wolfe, D. Rudoy, T. F. Quatieri and B. Dunn, “Adaptive short- time analysis-synthesis for speech enhancement,” IEEE Conference on Acoustics, Speech and Signal Processing, 2008

Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Thank you Questions ?