Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea Co-work with Frederic Berthommier ICP, INPG, France.

Slides:

Advertisements

Similar presentations

Autodirective Dual Microphone Digital Signal Processing technology to build an optimal directional microphone Presented by Alexander Goldin Copyright.

Advertisements

Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.

Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 181 Lecture 18 DSP-Based Analog Circuit Testing  Definitions  Unit Test Period (UTP)  Correlation.

Source separation and analysis of piano music signals using instrument-specific sinusoidal model Wai Man SZETO and Kin Hong WONG

Abstract Binaural microphones were utilised to detect phonation in a human subject (figure 1). This detection was used to cut the audio waveform in two.

Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.

Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.

SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.

Coherent envelope detection for modulation filtering of speech Steven Schimmel Les Atlas.

Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,

Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.

1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.

1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.

Sub-band Mixing and Addition of Digital Effects for Consumer Audio ELECTRICAL & ELECTRONIC ENGINEERING FINAL YEAR PROJECTS 2012/2013 Presented by Fionn.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Figures for Chapter 6 Compression

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

Lecture 5: Signal Processing II EEN 112: Introduction to Electrical and Computer Engineering Professor Eric Rozier, 2/20/13.

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.

Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.

Blind speech dereverberation using multiple microphones Inseon JANG, Seungjin CHOI Intelligent Multimedia Lab Department of Computer Science and Engineering,

Analysis of PSI beam test R.Sawada 09/Feb/2004 MEG collaboration R.Sawada 09/Feb/2004 MEG collaboration

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

- 1 - YLD 10/2/99ESINSA Tools YLD 10/2/99ESINSA Filters Performances A filter should maintain the signal integrity. A signal does not exist alone.

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.

Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.

Chaparral Physics Research

Using Feed Forward NN for EEG Signal Classification Amin Fazel April 2006 Department of Computer Science and Electrical Engineering University of Missouri.

Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.

Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

1. Draw a square. 2. Divide in half, horizontally and vertically.

(Extremely) Simplified Model of Speech Production

Laboratory for Experimental ORL K.U.Leuven, Belgium Dept. of Electrotechn. Eng. ESAT/SISTA K.U.Leuven, Belgium Combining noise reduction and binaural cue.

Dr. Galal Nadim.  The root-MUltiple SIgnal Classification (root- MUSIC) super resolution algorithm is used for indoor channel characterization (estimate.

Design of Electrocardiogram (ECG) System

Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.

Dongxu Yang, Meng Cao Supervisor: Prabin.  Review of the Beamformer  Realization of the Beamforming Data Independent Beamforming Statistically Optimum.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

APPLICATION OF A WAVELET-BASED RECEIVER FOR THE COHERENT DETECTION OF FSK SIGNALS Dr. Robert Barsanti, Charles Lehman SSST March 2008, University of New.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo January 15, 2015 Department of Electrical and Computer.

Speech Enhancement using Excitation Source Information B. Yegnanarayana, S.R. Mahadeva Prasanna & K. Sreenivasa Rao Department of Computer Science & Engineering.

Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.

Thomas F. Edgar (UT-Austin) RLS – Linear Models Virtual Control Book 12/06 Recursive Least Squares Parameter Estimation for Linear Steady State and Dynamic.

Hz A A A A A A LDL for speech Aided Speech output Functional Gain: The way.

Auditory Perception 1 Streaming 400 vs. 504 Hz 400 vs. 566 Hz 400 vs. 635 Hz 400 vs. 713 Hz A 400-Hz tone (tone A) is alternated with a tone of a higher.

Speech Enhancement Algorithm for Digital Hearing Aids

3.3.2 Moving-average filter

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Mark Sayles, Ian M. Winter Neuron

The Sound of the Original Sentences

Analysis of Dynamic Brain Imaging Data

Slow-γ Rhythms Coordinate Cingulate Cortical Responses to Hippocampal Sharp-Wave Ripples during Wakefulness Miguel Remondes, Matthew A. Wilson Cell.

INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)

Deep neural networks for spike sorting: exploring options

Combination of Feature and Channel Compensation (1/2)

Valerio Mante, Vincent Bonin, Matteo Carandini Neuron

Presentation transcript:

Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea Co-work with Frederic Berthommier ICP, INPG, France Subband cocktail-party speech separation: CASA vs. BSS

A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study. Left source Mixture Number95 Stereo Database Right source ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier Reference

Filterbank decomposition Subband processing Hz 100 Frequency Gain 4000 Hz Hz Frequency 4000 Hz Gain 100

The CASA Model TDOA estimation and weighting Filterbank decomposition Resynthesis

Left source Left output Frequency Reference Time Frequency Reconstruction Acuracy RA (output)RA (mixture) Frame of 1024 bins with half overlap Rl Yl

Gain of CASA

Gain of CASA : Relative Level RAX RAY Gain left (dB)

Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*). Subband effect for CASA nbsb dB RA left nbsb dB RA right dB nbsb RA left+right

Effect of nbsb : RA LeftRight Mixt. nbsb=1 nbsb=2 nbsb= Left RA (dB) Frame 1024 bins with half overlap Right RA (dB) 2 4

Relative Level (dB) Gain (dB) Subband effect for CASA: Gain RightLeft nbsb=4 nbsb=1

The BSS Model W rl W lr   X l (t) X r (t) Y l (t) Y r (t) Gain | Non linear function | Delayed output nbp Time Frequency Y l (t) Y r (t) 1 second

Gain of BSS :Relative Level RAX RAY Gain left (dB)

Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition (  ) and for "BSS ori" data (  ). Subband effect for BSS left nbsb dB right nbsb dB left+right nbsb

RA and Gain for BSS Left Right Mixt RA (dB) Left + -- RAX RAY 2 Frame 1024 bins with half overlap RA (dB) Right RL (dB) Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong

Subband effect for BSS: Gain Relative Level (dB) Gain of BSS (nbp=100) Gain (dB) Left Right nbsb=2 nbsb=1

Demixing filters Wlr time (bin) Wrl Wlr Wrl Frequency Wlr Wrl Frequency nbsb=1

Coherence spectrograms NBP=10 Mean(Coh)=0.65 Time Frequency left Time Frequency right Frames of 256 bins with half overlap Yl(n), Yl(n+1) Yr(n), Yr(n+1)

Effect of nbp: Coherence spectrograms NBP=3 NBP=10 NBP=100 LeftRightCoh

Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition (  ) and for "BSS ori" data (  ). Coherence statistic

Summary results Left Right CASABSS … Hearing REF mean Left Right