STRUCTURED SPARSE ACOUSTIC MODELING FOR SPEECH SEPARATION AFSANEH ASAEI JOINT WORK WITH: MOHAMMAD GOLBABAEE, HERVE BOURLARD, VOLKAN CEVHER.

Slides:



Advertisements
Similar presentations
IEEE BIBE th IEEE International Conference on BioInformatics and BioEngineering, November 10-13, Chania, Greece Towards an Overall 3-D Vector Field.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Acoustics and Concert Halls Stephanie Hsu March 21, 2005.
Digital Audio Processing Lab, Dept. of EEThursday, June 17 th Data-Adaptive Source Separation for Audio Spatialization Supervisors: Prof. Preeti Rao and.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
ELEC 407 DSP Project Algorithmic Reverberation – A Hybrid Approach Combining Moorer’s reverberator with simulated room IR reflection modeling Will McFarland.
Manifold Sparse Beamforming
MEASURES OF POST-PROCESSING THE HUMAN BODY RESPONSE TO TRANSIENT FIELDS Dragan Poljak Department of Electronics, University of Split R.Boskovica bb,
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
1 Applications on Signal Recovering Miguel Argáez Carlos A. Quintero Computational Science Program El Paso, Texas, USA April 16, 2009.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Virtualized Audio as a Distributed Interactive Application Peter A. Dinda Northwestern University Access Grid Retreat, 1/30/01.
Random Convolution in Compressive Sampling Michael Fleyer.
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Project Presentation: March 9, 2006
Rice University dsp.rice.edu/cs Distributed Compressive Sensing A Framework for Integrated Sensing and Processing for Signal Ensembles Marco Duarte Shriram.
A Multipath Sparse Beamforming Method
6.829 Computer Networks1 Compressed Sensing for Loss-Tolerant Audio Transport Clay, Elena, Hui.
Experimental Equalization of a One-Dimensional Sound Field Using Energy Density and a Parametric Equalizer Micah Shepherd, Xi Chen, Timothy W. Leishman,
Why is ASR Hard? Natural speech is continuous
Orthogonal Transforms
L INKWITZ L AB Accurate sound reproduction from two loudspeakers in a living room 13-Nov-07 (1) Siegfried Linkwitz.
Representing Acoustic Information
Linear Algebra and Image Processing
MIMO Multiple Input Multiple Output Communications © Omar Ahmad
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
Internet Engineering Czesław Smutnicki Discrete Mathematics – Discrete Convolution.
Compressive Sensing Based on Local Regional Data in Wireless Sensor Networks Hao Yang, Liusheng Huang, Hongli Xu, Wei Yang 2012 IEEE Wireless Communications.
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
EE D Fourier Transform. Bahadir K. Gunturk EE Image Analysis I 2 Summary of Lecture 2 We talked about the digital image properties, including.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
SCALE Workshop, Saarbrücken, January 12, 2010 Prof. Hervé Bourlard Idiap Research Institute EPFL Idiap Research Institute Centre du Parc P.O Box 592 CH.
May 3 rd, 2010 Update Outline Monday, May 3 rd 2  Audio spatialization  Performance evaluation (source separation)  Source separation  System overview.
SCALE Speech Communication with Adaptive LEarning Computational Methods for Structured Sparse Component Analysis of Convolutive Speech Mixtures Volkan.
Basics of Neural Networks Neural Network Topologies.
Multiple Audio Sources Detection and Localization Guillaume Lathoud, IDIAP Supervised by Dr Iain McCowan, IDIAP.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.
Doc.: IEEE /1011r0 Submission September 2009 Alexander Maltsev, IntelSlide 1 Verification of Polarization Impact Model by Experimental Data Date:
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
Dr. Galal Nadim.  The root-MUltiple SIgnal Classification (root- MUSIC) super resolution algorithm is used for indoor channel characterization (estimate.
Alexis Billona, Vincent Valeaua, Judicaël Picautb, Anas Sakouta
Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
CHARACTERIZATION PRESENTATION ANAT KLEMPNER SPRING 2012 SUPERVISED BY: MALISA MARIJAN YONINA ELDAR A Compressed Sensing Based UWB Communication System.
Project-Final Presentation Blind Dereverberation Algorithm for Speech Signals Based on Multi-channel Linear Prediction Supervisor: Alexander Bertrand Authors:
Image Enhancement (Frequency Domain)
Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,
Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Date of download: 5/31/2016 Copyright © 2016 SPIE. All rights reserved. Example of a time-variant filter F(t,ω) designed using Eq. (9) to compensate for.
FAST DYNAMIC MAGNETIC RESONANCE IMAGING USING LINEAR DYNAMICAL SYSTEM MODEL Vimal Singh, Ahmed H. Tewfik The University of Texas at Austin 1.
Fast Dynamic magnetic resonance imaging using linear dynamical system model Vimal Singh, Ahmed H. Tewfik The University of Texas at Austin 1.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
Biointelligence Laboratory, Seoul National University
Compressive Coded Aperture Video Reconstruction
Opracowanie językowe dr inż. J. Jarnicki
Advanced Wireless Networks
Outline Linear Shift-invariant system Linear filters
Aishwarya sreenivasan 15 December 2006.
INFONET Seminar Application Group
Chapter 3 Sampling.
COPYRIGHT © All rights reserved by Sound acoustics Germany
Presentation transcript:

STRUCTURED SPARSE ACOUSTIC MODELING FOR SPEECH SEPARATION AFSANEH ASAEI JOINT WORK WITH: MOHAMMAD GOLBABAEE, HERVE BOURLARD, VOLKAN CEVHER

φ 21 φ 52 s1s1 s2s2 s3s3 s4s4 s5s5 x1x1 x2x2 φ 11 φ 42 2 SPEECH SEPARATION PROBLEM SPARSITY is essential to deal with the ill-posed source separation problem

3 LISTENING RESULTS

Incorporation of acoustic channel model for speech separation  Cast speech separation problem as spatio-spectral information recovery from compressive acoustic measurements KEY IDEA Structured Sparse Speech Representation Acoustic Reverberation Models Microphone Array Speech Separation Structured Sparse Acoustic Modeling

SPECTROGRAPHIC SPEECH Source 1 Source 2 Source 3 Overlapping speech N sources M sensor < M source 5 Spectral Sparsity

SPECTRAL SPARSITY Compressibility of speech information bearing components Enables high accuracy speech recognition original spectrogram auditory spectrogram Figs. Ref. “Hearing is Believing”, R. Stern and N. Morgan, IEEE SPS Mag. Nov. 2012

SPECTRAL SPARSITY Disjointness of overlapping spectrographic speech Histogram of the energy of point-wise multiplication of two histograms of independent sources Diagonal Gram matrix

X21X22X23X24X25 X16X17X18X19X20 X 11 X 12 X 13 X 14 X 15 X6X6 X7X7 X8X8 X9X9 X 10 X1X1 X2X2 X3X3 X4X4 X5X5 8 SPATIAL SPARSITY Discretization of the planar area of the room Location of sound sources is sparse X 21 X 22 X 23 X 24 X 25 X 16 X 17 X 18 X 19 X 20 X 11 X 12 X 13 X 14 X 15 X6X6 X7X7 X8X8 X9X9 X 10 X1X1 X2X2 X3X3 X4X4 X5X5

9 OBJECTIVE Spatio-spectral sparse representation of overlapping speech sources GOAL: Model the acoustic reverberant channel Number of Microphones Number of cells on a Grid

MULTIPATH CHANNEL Reflection coefficient Speed of sound Sensor location Source location Number of reflections Microphone array measurement matrix Image Model and Green’s function of sound propagation

Structured sparsity underlying multipath propagation Spatial sparsity actual sources Structured sparsity actual-virtual sources REVERBERANT ACOUSTIC Image Map

New factorized formulation of multipath acquisition Free-space Green’s function matrix Permutation map; Actual sources  actual/virtual sources Source matrix; spatio-spectral content of frames at a given frequency Image map of i th source FACTORIZED FORMULATION XX OO SS = P

MEASUREMENT CORRELATION  Structured sparsity underlying correlation matrix  Goal: estimation of  Enables source localization and absorption coefficients estimation

GROUP SPARSE REPRESENTATION  Kronecker product property Kronecker product Element-wise conjugate  (number of sources) groups of contain nonzero elements  Identifying those groups determines source location  Recovering the corresponding elements of and normalization by source energy determines absorption coefficients

JOINT LOCALIZATION & ABSORPTION COEFFICIENT ESTIMATION Group sparse recovery

ROOM IMPULSE RESPONSE

NUMERICAL EVALUATIONS Multichannel overlapping numbers corpus (MONC)  Numbers corpus are played back  Recorded by 8-channel circular array in a room 8.2m×3.6m×2.4m  Reverberation time is 300 ms  Inverse filtering the acoustic channel following by linear post-filtering to enhance the separated signals 17

ABSORPTION COEFFICIENTS

WORD RECOGNITION RATE 19

PERCEPTUAL QUALITY 20

CONCLUDING REMARK  Characterization of the acoustic measurements for reverberant enclosures enables acoustic-aware source separation  High quality and recognition rate  Estimation of the reflections and attenuations for an unconstrained environment  Reconstruction of the sound field using plenacoustic function  Calibration of the acoustic measurement model  Non-uniform sampling the acoustic field  Extension to continuous sources  Incorporation of signal dependent models and low-rank structures  Post-processing of the signal recovery residual error

 J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” Journal of Acoustical Society of America, vol. 60(s1),  A. Asaei, M. Golbabaee, H. Bourlard, and V. Cevher, “Structured Sparsity Models for Multiparty Speech Recovery from Convolutive Recordings,” TASL submission,  “Can one hear the shape of a room: The 2-D polygonal case”, I. Dokmanic, Y. M. Lu and M. Vetterli, ICASSP  A. Asaei, H. Bourlard, and V. Cevher, “Model-based compressive sensing for multi-party distant speech recognition,” in Intl. Conference on Acoustic Speech and Signal Processing (ICASSP),  “The Multichannel Overlapping Numbers Corpus,” Idiap resources available online:, 22 REFERENCES THANK YOU!