A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Online PLCA for Real-Time Semi-supervised Source Separation Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Supervised Learning Recap
Lecture 3 Nonparametric density estimation and classification
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
1 Applications on Signal Recovering Miguel Argáez Carlos A. Quintero Computational Science Program El Paso, Texas, USA April 16, 2009.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Pattern Recognition and Machine Learning
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Dimensional reduction, PCA
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
lecture 2, linear imaging systems Linear Imaging Systems Example: The Pinhole camera Outline  General goals, definitions  Linear Imaging Systems.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning for Signal Processing Latent Variable Models and Signal Separation Class Oct Oct /18797.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Basics of Neural Networks Neural Network Topologies.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Latent Dirichlet Allocation
8.4.2 Quantum process tomography 8.5 Limitations of the quantum operations formalism 量子輪講 2003 年 10 月 16 日 担当:徳本 晋
Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil.
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Compressive Coded Aperture Video Reconstruction
LECTURE 11: Advanced Discriminant Analysis
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Statistical Models for Automatic Speech Recognition
Dynamical Statistical Shape Priors for Level Set Based Tracking
Latent Variables, Mixture Models and EM
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Models in Machine Learning
Statistical Models for Automatic Speech Recognition
Bayesian Nonparametric Matrix Factorization for Recorded Music
EE513 Audio Signals and Systems
LECTURE 15: REESTIMATION, EM AND MIXTURES
Nonparametric density estimation and classification
Machine Learning – a Probabilistic Perspective
Emad M. Grais Hakan Erdogan
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Combination of Feature and Channel Compensation (1/2)
Beehive Audio Source Separation
Presentation transcript:

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009

Introduction Problem : Single channel signal separation – Separating out signals from individual sources in a mixed recording General approach – Derive a generalizable model that captures the salient features of each source – Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

Physical Intuition Recover sources by reweighting of frequency subbands from a single recording

Latent Variable Model Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins – At a given time frame t, P t (f) represents the probabilty of drawing frequency f – The model assumes that P t (f) is comprised of bases indexed by a latent variable z

Latent Variable Model (Contd.) Now let the matrix V F×T of entries v ft represent the magnitude spectrogram of the mixture sound and v t represent time frame t (the t-th column vector of matrix V) First we assume that we have an already trained model in the form of basis vector P s (f/z) – These bases represent a dictionary of spectra that best describe each source

Source separation Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source – Use EM algorithm to estimate P t (z/s) and P t (s) The reconstruction of the contribution of source s in the mixture is given by

Contribution of this paper Use training data directly as a dictionary – Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window) – Side-step the need for separate model training step – Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models – Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

Using Training data as Dictionary Use each frame of the spectrograms of the training sequences as the bases P s (f/z) – Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T (s) values, and the z-th basis function will be given by the z-th column vector of W (s) With the above model ideally one would want to use one dictionary element per source at any point of time – Ensure output lie on the source manifold – Similar to a nearest neighbor model (search is computationally very expensive) – In this paper authors propose using sparsity

Entropic prior Given a probability distribution θ the entropic prior is defined as – α is a weighting factor and determines the level of sparsity – A sparse representation has a low entropy (since only few elements are ‘active”) – Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

Sparse approximation We would like to minimize the entropies of both the speaker dependent mixture weights and the source priors at every frame However, – Thus reducing the entropy of the joint distribution is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

Sparse approximation The model written in terms of this parameter is given by, To impose sparsity we apply the entropic prior given by, Apply EM to estimate Reconstructed source is given by,

Results on real data

Comments The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise Unfortunate side effect is the need to use a very large dictionary – However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases – Outperforms trained basis models of same size