COGNITIVE AND NEURAL SYSTEMS, BOSTON UNIVERSITY Boston, Massachusetts Madhusudana Shashanka 17 August 2007 Dissertation Defense Latent.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Independent Component Analysis
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Multipitch Tracking for Noisy Speech
Structured Sparse Principal Component Analysis Reading Group Presenter: Peng Zhang Cognitive Radio Institute Friday, October 01, 2010 Authors: Rodolphe.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Supervised Learning Recap
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Convex Optimization in Sinusoidal Modeling for Audio Signal Processing Michelle Daniels PhD Student, University of California, San Diego.
Principal Component Analysis
Dimensional reduction, PCA
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Machine Learning for Signal Processing Latent Variable Models and Signal Separation Class Oct Oct /18797.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Brian King, Advised by Les Atlas Electrical Engineering, University of Washington This research was funded by Air Force Office.
Non Negative Matrix Factorization
Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Gammachirp Auditory Filter
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.
Non-negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs International Symposium on Independent Component.
Endmember Extraction from Highly Mixed Data Using MVC-NMF Lidan Miao AICIP Group Meeting Apr. 6, 2006 Lidan Miao AICIP Group Meeting Apr. 6, 2006.
A Study of Sparse Non-negative Matrix Factor 2-D Deconvolution Combined With Mask Application for Blind Source Separation of Frog Species 1 Reporter :
Lecture 2: Statistical learning primer for biologists
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
Review of Spectral Unmixing for Hyperspectral Imagery Lidan Miao Sept. 29, 2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos
2D-LDA: A statistical linear discriminant analysis for image matrix
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Speech and Singing Voice Enhancement via DNN
Approaches of Interest in Blind Source Separation of Speech
LECTURE 11: Advanced Discriminant Analysis
ARTIFICIAL NEURAL NETWORKS
Machine Learning Basics
Information-Theoretic Listening
PCA vs ICA vs LDA.
EE513 Audio Signals and Systems
Emad M. Grais Hakan Erdogan
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Combination of Feature and Channel Compensation (1/2)
Beehive Audio Source Separation
Presentation transcript:

COGNITIVE AND NEURAL SYSTEMS, BOSTON UNIVERSITY Boston, Massachusetts Madhusudana Shashanka 17 August 2007 Dissertation Defense Latent Variable Framework for Modeling and Separating Single-Channel Acoustic Sources Committee Chair: Prof. Daniel Bullock Readers: Prof. Barbara Shinn-Cunningham Dr. Paris Smaragdis Prof. Frank Guenther Reviewers: Dr. Bhiksha Raj Prof. Eric Schwartz

2 Outline Introduction Time-Frequency Structure Background Latent Variable Decomposition: A Probabilistic Framework Contributions Sparse Overcomplete Decomposition Conclusions

3 Introduction The achievements of the ear are indeed fabulous. While I am writing, my elder son rattles the fire rake in the stove, the infant babbles contentedly in his baby carriage, the church clock strikes the hour, … … In the vibrations of air striking my ear, all these sounds are superimposed into a single extremely complex stream of pressure waves. Without doubt the achievements of the ear are greater than those of the eye. Wolfgang Metzger, in Gesetze des Sehens (1953) Abridged in English and quoted by Reinier Plomp (2002) Introduction

4 Cocktail Party Effect Introduction Cocktail Party Effect (Cocktail Party by SLAW, Maniscalco Gallery. From slides of Prof. Shinn-Cunningham, ARO 2006) Colin Cherry (1953) Our ability to follow one speaker in the presence of other sounds. The auditory system separates the input into distinct auditory objects. Challenging problem from a computational perspective.

5 Cocktail Party Effect Fundamental questions –How does the brain solve it? –Is it possible to build a machine capable of solving it in a satisfactory manner? Need not mimic the brain Two cases –Multi-channel (Human auditory system is an example with two sensors) –Single-Channel Introduction Cocktail Party Effect

6 Source Separation: Formulation Given just, how to separate sources ? Problem: Indeterminacy –Multiple ways in which source signals can be reconstructed from the available information Introduction Source Separation x ( t ) = N X j = 1 s j ( t ) x ( t ) s 1 ( t ) s N ( t ) s j ( t )..... x ( t ) s j ( t )

7 Source Separation: Approaches Exact solutions not possible, but can approximate –by utilizing information about the problem Psychoacoustically/Biologically inspired approach –Understand how the auditory system solves the problem –Utilize the insights gained (rules and heuristics) in the artificial system Engineering approach –Utilize probability and signal processing theories to take advantage of known or hypothesized structure/statistics of the source signals and/or the mixing process Introduction Source Separation

8 Source Separation: Approaches Psychoacoustically inspired approach –Seminal work of Bregman (1990) - Auditory Scene Analysis (ASA) –Computational Auditory Scene Analysis (CASA) –Computational implementations of the views outlined by Bregman (Rosenthal and Okuno, 1998) –Limitations: reconcile subjective concepts (e.g. “similarity”, “continuity”) with strictly deterministic computational platforms? –Difficulty incorporating statistical information Engineering approach –Most work has focused on multi-channel signals –Blind Source Separation: Beamforming and ICA –Unsuitable for single-channel signals Introduction Source Separation

9 Source Separation: This Work We take a machine learning approach in a supervised setting –Assumption: One or more sources present in the mixture are “known” –Analyze the sample waveforms of the known sources and extract characteristics unique to each one –Utilize the learned information for source separation and other applications Focus on developing a probabilistic framework for modeling single-channel sounds –Computational perspective, goal not to explain human auditory processing –Provide a framework grounded in theory that allows principled extensions –Aim is not just to build a particular separation system Introduction Source Separation

10 Outline Introduction Time-Frequency Structure –We need a representation of audio to proceed Latent Variable Decomposition: A Probabilistic Framework Sparse Overcomplete Decomposition Conclusions

11 Representation Time-Frequency Structure Audio Representation FREQ TIME Time-domain representation Sampled waveform: each sample represents the sound pressure level at a particular time instant. Time-Frequency representation TF representation shows energy in TF bins explicitly showing the variation along time and frequency. TIME AMPL

12 TF Representations Time-Frequency Structure Audio Representation Short Time Fourier Transform (STFT; Gabor, 1946) time-frames: successive fixed-width snippets of the waveform (windowed and overlapping) Spectrogram: Fourier transforms of all time slices. The result for a given time slice is a spectral vector. Other TF representations possible (different filter banks): only STFT considered in this work Constant-Q (Brown, 1991) Gammatone (Patterson et al. 1995) Gamma-chirp (Irino and Patterson, 1997) TF distributions (Laughlin et al. 1994) Piano Cymbals Female Speaker

13 Magnitude Spectrograms Time-Frequency Structure Audio Representation Short Time Fourier Transform (STFT; Gabor, 1946) Magnitude spectrograms: TF entries represent energy-like quantities that can be approximated to add additively in case of sound mixtures Phase information is ignored. Enough information present in the magnitude spectrogram, simple test: Speech with cymbals phase Cymbals with piano phase ++

14 TF Masks Time-Frequency Masks Popular in the CASA literature Assign higher weight to areas of the spectrogram where target is dominant Intuition: dominant source masks the energy of weaker ones in any TF bin, thus only such “dominant” TF bins are sufficient for reconstruction Reformulate the problem – goal is to estimate the TF mask (Ideal Binary Mask; Wang, 2005) Utilize cues like harmonicity, F0 continuity, common onsets/offsets etc.: –Synchrony strand (Cooke, 1991) –TF Maps (Brown and Cooke, 1994) –Correlograms (Weintraub, 1985; Slaney and Lyon, 1990) Time-Frequency Structure Modeling TF Structure Target Mixture “Masked” Mixture

15 TF Masks Time-Frequency Masks: Limitations Implementation of “fuzzy” rules and heuristics from ASA; ad-hoc methods, difficulty incorporating statistical information (Roweis, 2000) Assumption: energy sparsely distributed i.e. different sources are disjoint in their spectro-temporal content (Yilmaz and Rickard, 2004) –performs well only on mixtures that exhibit well-defined regions in the TF plane corresponding to the various sources (van der Kouwe et al. 2001) Time-Frequency Structure Modeling TF Structure Target Mixture “Masked” Mixture

16 Basis Decomposition Methods Basis Decomposition Idea: observed data vector can be expressed as a linear combination of a set of “basis components” Data vectors  spectral vectors Intuition: every source exhibits characteristic structure that can be captured by a finite set of components Time-Frequency Structure Modeling TF Structure Data Vectors Mixture Weights Basis Components v t = K X k = 1 h k t w k V = WH +

17 Basis Decomposition Methods Many Matrix Factorization methods available, e.g. PCA, ICA Toy example: PCA components can have negative values But spectrogram values are positive – interpretation? Time-Frequency Structure Modeling TF Structure +¡+¡ PCA Components FREQ TIME FREQ

18 Basis Decomposition Methods Non-negative Matrix Factorization (Lee and Seung, 1999) Explicitly enforces non-negativity on both the factored matrices Useful for analyzing spectrograms (Smaragdis, 2004, Virtanen, 2006) Issues –Can’t incorporate prior biases –Restricted to 2D representations Time-Frequency Structure Modeling TF Structure W H Mixture Weights Basis Components FREQ TIME

19 Outline Introduction Time-Frequency Structure Latent Variable Decomposition: Probabilistic Framework –Our alternate approach: Latent variable decomposition treating spectrograms as histograms Sparse Overcomplete Decomposition Conclusions

20 Latent Variables Widely used in social and behavioral sciences –Traced back to Spearman (1904), factor analytic models for Intelligence Testing Latent Class Models ( Lazarsfeld and Henry, 1968 ) –Principle of local independence (or the common cause criterion) –If a latent variable underlies a number of observed variables, the observed variables conditioned on the latent variable should be independent Latent Variable Decomposition Background

21 Spectrograms as Histograms Generative Model Spectral vectors – energy at various frequency bins Histograms of multiple draws from a frame-specific multinomial distribution over frequencies Each draw  “a quantum of energy” Latent Variable Decomposition Generative Model f HISTOGRAM f FRAME t HISTOGRAM Pick a ball Note color, update histogram + 1 Place it back Multinomial Distribution underlying the t-th spectral vector P t ( f )

22 Model Latent Variable Decomposition Framework f P t ( f )

23 Model Latent Variable Decomposition Framework f P ( f j z ) P t ( z ) P t ( f ) = X z P ( f j z ) P t ( z ) Generative Model Mixture Multinomial Procedure –Pick Latent Variable z (urn): –Pick frequency f from urn: –Repeat the process times, the total energy in the t-th frame P t ( z ) P ( f j z ) V t

24 Model Generative Model Mixture Multinomial Procedure –Pick Latent Variable z (urn): –Pick frequency f from urn: –Repeat the process times, the total energy in the t-th frame Latent Variable Decomposition Framework f HISTOGRAM... P t ( z ) P ( f j z ) V t P t ( f ) = X z P ( f j z ) P t ( z ) Frame-specific spectral distribution Frame-specific mixture weights Source-specific basis components

25 Model Latent Variable Decomposition Framework f HISTOGRAM... P t ( f ) = X z P ( f j z ) P t ( z ) Frame-specific spectral distribution Frame-specific mixture weights Source-specific basis components Generative Model Mixture Multinomial Procedure –Pick Latent Variable z (urn): –Pick frequency f from urn: –Repeat the process times, the total energy in the t-th frame P t ( z ) P ( f j z ) V t

26 Model Latent Variable Decomposition Framework f HISTOGRAM... P t ( f ) = X z P ( f j z ) P t ( z ) Frame-specific spectral distribution Frame-specific mixture weights Source-specific basis components Generative Model Mixture Multinomial Procedure –Pick Latent Variable z (urn): –Pick frequency f from urn: –Repeat the process times, the total energy in the t-th frame P t ( z ) P ( f j z ) V t

27 Model Latent Variable Decomposition Framework f P t ( f ) P ( f j z ) P t ( z ) P t ( f ) = X z P ( f j z ) P t ( z ) Generative Model Mixture Multinomial Procedure –Pick Latent Variable z (urn): –Pick frequency f from urn: –Repeat the process times, the total energy in the t-th frame P t ( z ) P ( f j z ) V t

28 The mixture multinomial as a point in a simplex Latent Variable Decomposition Framework P ( f j z ) P t ( f )

29 Learning the Model Analysis Given the spectrogram, estimate the parameters represent the latent structure, they underlie all the frames and hence characterize the source Latent Variable Decomposition Framework... P t ( f ) = X z P ( f j z ) P t ( z ) Frame-specific spectral distribution Frame-specific mixture weights Source-specific basis components P ( f j z ) VV

30 Learning the Model: Geometry Latent Variable Decomposition Model Geometry Spectral distributions and basis components are points in a simplex Estimation process: find corners of the convex hull that surrounds normalized spectral vectors in the simplex

31 Learning the Model: Parameter Estimation Expectation-Maximization Algorithm Latent Variable Decomposition Framework P t ( z ) = P f V f t P t ( z j f ) P z P f V f t P t ( z j f ) P ( f j z ) = P t V f t P t ( z j f ) P f P t V f t P t ( z j f ) P t ( z j f ) = P t ( z ) P ( f j z ) P z P t ( z ) P ( f j z ) V f t Entries of the training spectrogram

32 Example Bases Latent Variable Decomposition Framework f z t Speech Harp Piano

33 Source Separation Latent Variable Decomposition Source Separation SpeechHarp

34 Source Separation Latent Variable Decomposition Source Separation SpeechHarpMixture Speech Bases Harp Bases

35 Source Separation Latent Variable Decomposition Source Separation Mixture Spectrogram Model –linear combination of individual sources P t ( f ) = P t ( s 1 ) P ( f j s 1 ) + P t ( s 2 ) P ( f j s 2 ) P t ( f ) = Ã P t ( s 1 ) X z 2 f z s 1 g P t ( z j s 1 ) P s 1 ( f j z ) ! + Ã P t ( s 2 ) X z 2 f z s 2 g P t ( z j s 2 ) P s 2 ( f j z ) !

36 Source Separation Latent Variable Decomposition Source Separation P t ( s ; z j f ) = P t ( s ) P t ( z j s ) P s ( f j z ) P s P t ( s ) P z 2 f z s g P t ( z j s ) P s ( f j z ) P t ( f ; s ) = P t ( s ) X z 2 f z s g P t ( z j s ) P s ( f j z ) Expectation-Maximization Algorithm P t ( s ) = P f V f t P z 2 f z s g P t ( s ; z j f ) P s P f V f t P z 2 f z s g P t ( s ; z j f ) P t ( z j s ) = P f V f t P t ( s ; z j f ) P z 2 f z s g P f V f t P t ( s ; z j f ) V f t Entries of the mixture spectrogram

37 Source Separation Latent Variable Decomposition Source Separation P t ( f ) = P t ( s 1 ) P ( f j s 1 ) + P t ( s 2 ) P ( f j s 2 ) P t ( f ) = Ã P t ( s 1 ) X z 2 f z s 1 g P t ( z j s 1 ) P s 1 ( f j z ) ! + Ã P t ( s 2 ) X z 2 f z s 2 g P t ( z j s 2 ) P s 2 ( f j z ) ! V f t Mixture Spectrogram ^ V f t ( s ) = P t ( f ; s ) P t ( f ) V f t P t ( f ; s )

38 Source Separation Latent Variable Decomposition Source Separation V © O R Magnitude of the mixture Phase of the mixture Mag. of the original signal Mag. of the reconstruction g ( X ) = 10 l og 10 ³ P f ; t O 2 f t P f ; t j O f t e j © f t ¡ X f t e j © f t j 2 ´ Mixture Reconstructions SNR Improvements5.25 dB5.30 dB SNR = g ( R ) ¡ g ( V )

39 Source Separation: Semi supervised Latent Variable Decomposition Source Separation Possible even if only one source is “known” –Bases of other source estimated during separation “Raise My Rent” by David Gilmour Background music “bases” learned from 5 seconds of music-only clips in the song Lead guitar bases learned from the rest of the song “Sunrise” by Norah Jones Harder – wave-file clipped Background music bases learned from 5 seconds of music-only segments of the song More eg: SongFGBGSongFGBG = + Denoising Only speech known

40 Outline Introduction Time-Frequency Structure Latent Variable Decomposition: Probabilistic Framework Sparse Overcomplete Decomposition –Learning more structure than the dimensionality will allow Conclusions

41 Limitation of the Framework Real signals exhibit complex spectral structure –The number of components required to model this structure could be potentially large –However, the latent variable framework has a limitation: The number of components that can be extracted is limited by the number of frequency bins in the TF representation ( an arbitrary choice in the context of ground truth). –Extracting an overcomplete set of components leads to the problem of indeterminacy Sparse Overcomplete Decomposition Sparsity

42 Overcompleteness: Geometry Sparse Overcomplete Decomposition Geometry of Sparse Coding Overcomplete case –As the number of bases increases, basis components migrate towards the corners of the simplex –Accurately represent data, but lose data-specificity

43 Indeterminacy in the Overcomplete Case Sparse Overcomplete Decomposition Geometry of Sparse Coding Multiple solutions that have zero error  indeterminacy

44 Sparse Coding Sparse Overcomplete Decomposition Geometry of Sparse Coding Restriction  use the fewest number of corners –At least three required for accurate representation –The number of possible solutions greatly reduced, but still indeterminate –Instead, we minimize the entropy of mixture weights ABD, ACE, ACD ABE, ABG, ACG ACF

45 Sparsity Sparsity – originated as a theoretical basis for sensory coding ( Kanerva, 1988; Field, 1994; Olshausen and Field, 1996 ) –Following Attneave (1954), Barlow (1959, 1961) to use information-theoretic principles to understand perception –Has utility in computational models and engineering methods How to measure sparsity? –fewer number of components  more sparsity Number of non-zero mixture weights i.e. the L0 norm –L0 hard to optimize; L1 (or L2 in certain cases) used as an approximation –We use entropy of the mixture weights as the measure Sparse Overcomplete Decomposition Sparsity

46 Learning Sparse Codes: Entropic Prior Model -- –Estimate such that entropy of is minimized Impose an entropic prior on ( Brand, 1999 ) – where is the entropy – is the sparsity parameter that can be controlled – with high entropies are penalized with low probability – MAP formulation solved using Lambert’s W function Sparse Overcomplete Decomposition Sparsity P ( f j z ) P t ( z ) P t ( z ) H ( µ ) = ¡ X i µ i l og µ i P t ( z ) P e ( µ ) /e ¯ H ( µ ) P t ( f ) = X z P ( f j z ) P t ( z ) P t ( z ) ¯

47 Geometry of Sparse Coding Sparse Overcomplete Decomposition Geometry of Sparse Coding Sparse Overcomplete case –Sparse mixture weights  spectral vectors must be close to a small number of corners, forcing the convex hull to be compact –Simplex formed by bases shrinks to fit the data

48 Sparse-coding: Illustration Sparse Overcomplete Decomposition Examples P ( f j z ) P t ( z ) f t No Sparsity

49 Sparse-coding: Illustration Sparse Overcomplete Decomposition Examples P ( f j z ) P t ( z ) f t No Sparsity

50 Sparse-coding: Illustration Sparse Overcomplete Decomposition Examples P ( f j z ) P t ( z ) f t Sparse Mixture Weights

51 Speech Bases Sparse Overcomplete Decomposition Examples Trained without Sparse Mixture Weights Compact Code Trained with Sparse Mixture Weights Sparse-Overcomplete Code f f

52 Entropy Trade-off Sparse Overcomplete Decomposition Geometry of Sparse Coding Sparse-coding Geometry Sparse mixture weights  bases which are holistic representations of the data Decrease in mixture-weight entropy  increase in bases components entropy, components become more “informative” –Empirical evidence

53 Source Separation: Results Sparse Overcomplete Decomposition Source Separation Red – CC, compact code, 100 components Blue – SC, sparse-overcomplete code, 1000 components, ¯ = 0 : 7 Mixture 1 Mixture 2 CC SC CC SC 3.82 dB 3.80 dB 9.16 dB 8.90 dB 5.25 dB5.30 dB 8.33 dB8.02 dB

54 Source Separation: Results Sparse Overcomplete Decomposition Source Separation Results Sparse-overcomplete code leads to better separation Separation quality increases with increasing sparsity before tapering off at high sparsity values (> 0.7)

55 Other Applications Sparse Overcomplete Decomposition Other Applications Framework is general, operates on non-negative data –Text data (word counts), images etc. Examples

56 Other Applications Sparse Overcomplete Decomposition Other Applications Framework is general, operates on non-negative data –Text data (word counts), images etc. Image Examples: Feature Extraction

57 Other Applications Sparse Overcomplete Decomposition Other Applications Framework is general, operates on non-negative data –Text data (word counts), images etc. Image Examples: Hand-written digit classification

58 Other Applications Sparse Overcomplete Decomposition Other Applications Framework is general, operates on non-negative data –Text data (word counts), images etc. Image Examples: Hand-written digit classification

59 Outline Introduction Time-Frequency Structure Latent Variable Decomposition: Probabilistic Framework Sparse Overcomplete Decomposition Conclusions –In conclusion…

60 Thesis Contributions Modeling single-channel acoustic signals – important applications in various fields Provides a probabilistic framework – amenable to principled extensions and improvements Incorporates the idea of sparse coding in the framework Points to other extensions – in the form of priors Theoretical analysis of models and algorithms Applicability to other data domains Six refereed publications in international conferences and workshops (ICASSP, ICA, NIPS), two manuscripts under review (IEEE TPAMI, NIPS) Conclusions Thesis Contributions

61 Future Work Representation –Other TF representations (eg. constant-Q, gammatone) –Multidimensional representations (correlograms, higher order spectra) –Utilize phase information in the representation Model and Theory Applications Conclusions Future Work

62 Future Work Representation Model and Theory –Employ priors on parameters to impose known/hypothesized structure (Dirichlet, mixture Dirichlet, Logistic Normal) –Explicitly model time structure using HMMs/other dynamic models –Utilize discriminative learning paradigm –Extract components that form independent subspaces, could be used for unsupervised separation –Relation between sparse decomposition and non-negative ICA –Extensions/improvements to inference algorithms (eg. tempered EM) Applications Conclusions Future Work

63 Future Work Representation Model and Theory Applications –Other audio applications such as music transcription, speaker recognition, audio classification, language identification etc. –Explore applications in data-mining, text semantic analysis, brain- imaging analysis, radiology, chemical spectral analysis etc. Conclusions Future Work

64 Acknowledgements Prof. Barbara Shinn-Cunningham Dr. Bhiksha Raj and Dr. Paris Smaragdis Thesis Committee Members Faculty/Staff at CNS and Hearing Research Center Scientists/Staff at Mitsubishi Electric Research Laboratories Friends and well-wishers –Supported in part by the Air Force Office of Scientific Research (AFOSR FA ), the National Institutes of Health (NIH R01 DC05778), the National Science Foundation (NSF SBE ), and the Office of Naval Research (ONR N ). –Arts and Sciences Dean’s Fellowship, Teaching Fellowship –Internships at Mitsubishi Electric Research Laboratories

65 Additional Slides

66 Publications Refereed Publications and Manuscripts Under Review MVS Shashanka, B Raj, P Smaragdis. “Probabilistic Latent Variable Model for Sparse Decompositions of Non-negative Data” submitted to IEEE Transactions on Pattern Analysis And Machine Intelligence MVS Shashanka, B Raj, P Sparagdis. “Sparse Overcomplete Latent Variable Decomposition of Counts Data” submitted to NIPS 2007 P Smaragdis, B Raj, MVS Shashanka. “Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures,” Intl. Conf. on ICA, London, Sep 2007 MVS Shashanka, B Raj, P Smaragdis. “Sparse Overcomplete Decomposition for Single Channel Speaker Separation,” Intl. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Apr 2007 B Raj, R Singh, MVS Shashanka, P Smaragdis. “Bandwidth Expansion with a Polya Urn Model,” Intl. Conf. on Acoustics, Speech and Signal Proc., Honolulu, Apr 2007 B Raj, P Smaragdis, MVS Shashanka, R Singh, “Separating a Foreground Singer from Background Music,” Intl Symposium on Frontiers of Research on Speech and Music, Mysore, India, Jan 2007 P Smaragdis, B Raj, MVS Shashanka. “A Probabilistic Latent Variable Model for Acoustic Modeling,” Workshop on Advances in Models for Acoustic Processing, NIPS 2006 B Raj, MVS Shashanka, P Smaragdis. “Latent Dirichlet Decomposition for Single Channel Speaker Separation,” Intl. Conf. on Acoustics, Speech and Signal Processing, Paris, May 2006 Conclusions Thesis Contributions