Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 Maarten De Vos SISTA – SCD - BIOMED K.U.Leuven On the combination of ICA and CPA Maarten De Vos Dimitri Nion Sabine Van Huffel Lieven De Lathauwer.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Visual Recognition Tutorial
Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Clustering.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
1 Blind Separation of Audio Mixtures Using Direct Estimation of Delays Arie Yeredor Dept. of Elect. Eng. – Systems School of Electrical Engineering Tel-Aviv.
Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Presented by Tienwei Tsai July, 2005
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.
May 3 rd, 2010 Update Outline Monday, May 3 rd 2  Audio spatialization  Performance evaluation (source separation)  Source separation  System overview.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Adaptive Methods for Speaker Separation in Cars DaimlerChrysler Research and Technology Julien Bourgeois
Texture scale and image segmentation using wavelet filters Stability of the features Through the study of stability of the eigenvectors and the eigenvalues.
SCALE Speech Communication with Adaptive LEarning Computational Methods for Structured Sparse Component Analysis of Convolutive Speech Mixtures Volkan.
Basics of Neural Networks Neural Network Topologies.
Communication Group Course Multidimensional DSP DoA Estimation Methods Pejman Taslimi – Spring 2009 Course Presentation – Amirkabir Univ Title: Acoustic.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,
Independent Component Analysis Independent Component Analysis.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,
Siemens Corporate Research Rosca et al. – Generalized Sparse Mixing Model & BSS – ICASSP, Montreal 2004 Generalized Sparse Signal Mixing Model and Application.
Benedikt Loesch and Bin Yang University of Stuttgart Chair of System Theory and Signal Processing International Workshop on Acoustic Echo and Noise Control,
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Bayesian Semi-Parametric Multiple Shrinkage
Classification of unlabeled data:
Latent Variables, Mixture Models and EM
Application of Independent Component Analysis (ICA) to Beam Diagnosis
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
SMEM Algorithm for Mixture Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
Biointelligence Laboratory, Seoul National University
EM Algorithm and its Applications
Emad M. Grais Hakan Erdogan
Presentation transcript:

Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS project team, INRIA, Center de Rennes - Bretagne Atlantique, France Nov

Table of content 2  Problem introduction and motivation  Considered framework and contributions  Estimation of model parameters  Conclusion and perspective

Under-determined source separation 3  Use recorded mixture signals to separate sources, where  Convolutive mixing model: Denotes the source images, i.e. the contribution of a source to all microphones, and the vector of mixture signals where the vector of mixing filters from source to microphone array

Baseline approaches Sparsity assumption: only FEW sources are active at each time-frequency point Binary masking (DUET): only ONE source is active at each time-frequency point L1-norm minimization: 4 STFT with narrowband approximation These techniques remain limited in the realistic reverberant environments since the narrowband approximation does not hold

Considered framework Models the STFT coefficients of the source images as zero-mean multivariate Gaussian random variables, i.e. Spatial covariance models Rank-1 model (given by the narrowband assumption): Full-rank unconstrained model: The coefficients of are unrelated a priori 5 Most general possible model which allows more flexible modeling the mixing process Scalar source variances encoding spectro-temporal power of sources I x I spatial covariance matrices encoding spatial position and spatial spread of sources

Considered framework 6 Source separation can be achieved in two steps: 1. Model parameters are estimated in the ML sense - Expectation Maximization (EM) algorithm is well-known as an appropriate choice for this ML estimation of the Gaussian mixing model 2. Source separation by multichannel Wiener filtering Raised issues: - Parameter initialization for EM - Permutation alignment (well-known in frequency-domain BSS)

Proposed algorithm 7 Flow of the BSS algorithms ISTFTSTFT Initialization by Hierarchical Clustering Model parameter estimation by EM Permutation alignment Wiener filtering In each step, we adapt the existing methods for the rank-1 model to our proposed full-rank unconstrained model

Parameter initialization [S. Winter et al. EURASIP vol.2007] 8 Principle: perform the hierarchical clustering of the mixture STFT coefficients in each frequency bin after a proper phase and amplitude normalization Adaptations to our algorithms: 1. and are computed from the phase normalized STFT coefficients instead of from both phase and amplitude normalized coefficients 2.We defines the distance between clusters as the average distance between samples instead of the minimum distance between them. Source variance initialization:

EM algorithm 9 EM for rank-1 model [ C. Fevotte and J-F Cardoso, WASPAA2005 ] - Mixing model: must consider noise component Adaptations to the full-rank model - Apply EM directly to the noiseless mixing model, i.e. - Derive alternating parameter update rule (M-step) by maximizing the likelihood of the complete data

Permutation alignment [H. Sawada et al. ICASSP2006 ] 10 Phase of before and after permutation alignment with Principle: permute the source orders base on the estimated source DoAs and the clustered phase-normalized mixing vectors. Adaptation to the full-rank model: Computing the first principal component of by PCA and then applying the algorithm to the “equivalent” mixing vector The order of is permuted identically to that of

Experiment setup r=0.5m s1s1 s2s2 s3s3 m1m1 m2m2 1.8m 1.5m Source and microphone height: 1.4 m Room dimensions: 4.45 x 3.35 x 2.5 m Microphone distance: d = 0.05 m Reverberation time: 50, 130, 250, 500ms Number of stereo mixtures3 Speech length8 s Sampling rate16 kHz STFT window typeSine Window length1024 Number of EM iterations10 Number of clusters K30 Geometry setting Parameter and program settings 11

Experimental result Full-rank model outperforms both the rank-1 model and baseline approaches in a realistic reverberant environments mixture 12

Conclusion & future work 13 Contributions - Proposed to model the convolutive mixing process by full-rank unconstrained spatial covariance matrices - Designed the model parameter estimation algorithms for the full-rank model by adapting the estimation for rank-1 model - We showed that the proposed algorithm using the full-rank unconstrained spatial covariance model outperforms state-of-the-art approaches. Current result (in collaboration with S. Arberet and A. Ozerov) Combined the proposed full-rank unconstrained covariance model with NMF model for source spectra (to appear in ISSPA, May 2010). Future work Consider the full-rank unconstrained model in the context of source localization.

Thanks for your attention! & Your comments…? 14