Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea Co-work with Frederic Berthommier ICP, INPG, France Subband cocktail-party speech separation: CASA vs. BSS
A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study. Left source Mixture Number95 Stereo Database Right source ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier Reference
Filterbank decomposition Subband processing Hz 100 Frequency Gain 4000 Hz Hz Frequency 4000 Hz Gain 100
The CASA Model TDOA estimation and weighting Filterbank decomposition Resynthesis
Left source Left output Frequency Reference Time Frequency Reconstruction Acuracy RA (output)RA (mixture) Frame of 1024 bins with half overlap Rl Yl
Gain of CASA
Gain of CASA : Relative Level RAX RAY Gain left (dB)
Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*). Subband effect for CASA nbsb dB RA left nbsb dB RA right dB nbsb RA left+right
Effect of nbsb : RA LeftRight Mixt. nbsb=1 nbsb=2 nbsb= Left RA (dB) Frame 1024 bins with half overlap Right RA (dB) 2 4
Relative Level (dB) Gain (dB) Subband effect for CASA: Gain RightLeft nbsb=4 nbsb=1
The BSS Model W rl W lr X l (t) X r (t) Y l (t) Y r (t) Gain | Non linear function | Delayed output nbp Time Frequency Y l (t) Y r (t) 1 second
Gain of BSS :Relative Level RAX RAY Gain left (dB)
Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition ( ) and for "BSS ori" data ( ). Subband effect for BSS left nbsb dB right nbsb dB left+right nbsb
RA and Gain for BSS Left Right Mixt RA (dB) Left + -- RAX RAY 2 Frame 1024 bins with half overlap RA (dB) Right RL (dB) Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong
Subband effect for BSS: Gain Relative Level (dB) Gain of BSS (nbp=100) Gain (dB) Left Right nbsb=2 nbsb=1
Demixing filters Wlr time (bin) Wrl Wlr Wrl Frequency Wlr Wrl Frequency nbsb=1
Coherence spectrograms NBP=10 Mean(Coh)=0.65 Time Frequency left Time Frequency right Frames of 256 bins with half overlap Yl(n), Yl(n+1) Yr(n), Yr(n+1)
Effect of nbp: Coherence spectrograms NBP=3 NBP=10 NBP=100 LeftRightCoh
Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition ( ) and for "BSS ori" data ( ). Coherence statistic
Summary results Left Right CASABSS … Hearing REF mean Left Right