Download presentation
Presentation is loading. Please wait.
Published byHugh Curtis Modified over 9 years ago
2
Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea seungjin@postech.ac.kr Co-work with Frederic Berthommier ICP, INPG, France Subband cocktail-party speech separation: CASA vs. BSS
3
A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study. Left source Mixture Number95 Stereo Database Right source ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier Reference
4
Filterbank decomposition Subband processing 0 0.2 0.4 0.6 0.8 1 1004000 Hz 100 Frequency Gain 4000 Hz 0 0.2 0.4 0.6 0.8 1 1004000 Hz 0 0.2 0.4 0.6 0.8 1 Frequency 4000 Hz 0 0.2 0.4 0.6 0.8 1 Gain 100
5
The CASA Model TDOA estimation and weighting Filterbank decomposition Resynthesis
6
Left source Left output Frequency Reference Time 050010001500200025003000 0 0.2 0.4 0.6 0.8 1 Frequency 0 0.2 0.4 0.6 0.8 1 2468101214 Reconstruction Acuracy RA (output)RA (mixture) Frame of 1024 bins with half overlap Rl Yl
7
Gain of CASA
8
Gain of CASA : Relative Level RAX RAY 4 -2 0 2 Gain left (dB)
9
Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*). Subband effect for CASA 12345 6 6.5 7 7.5 8 8.5 9 9.5 nbsb dB RA left 256 512 12345 6 6.5 7 7.5 8 8.5 9 9.5 nbsb dB RA right dB 12345 12 13 14 15 16 17 18 19 nbsb RA left+right
10
Effect of nbsb : RA LeftRight Mixt. nbsb=1 nbsb=2 nbsb=4 02468101214 0 5 10 15 20 Left RA (dB) Frame 1024 bins with half overlap 02468101214 -5 0 5 10 15 Right RA (dB) 2 4
11
-50-40-30-20-1001020304050 -8 -6 -4 -2 0 2 4 Relative Level (dB) Gain (dB) Subband effect for CASA: Gain RightLeft nbsb=4 nbsb=1
12
The BSS Model W rl W lr X l (t) X r (t) Y l (t) Y r (t) Gain | Non linear function | Delayed output nbp Time Frequency 0500100015002000250030003500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Y l (t) Y r (t) 1 second
13
Gain of BSS :Relative Level RAX RAY Gain left (dB) -6 -2 2 6
14
Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition ( ) and for "BSS ori" data ( ). Subband effect for BSS 1234 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 left nbsb dB 2 3 10 100 1234 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 right nbsb dB 1234 10 11 12 13 14 15 16 17 18 19 20 left+right nbsb
15
RA and Gain for BSS Left Right Mixt. 0468101214 -5 0 5 10 15 20 02468101214 -5 0 5 10 15 RA (dB) Left + -- RAX RAY 2 Frame 1024 bins with half overlap RA (dB) Right + - 02468101214 -10 0 10 20 RL (dB) Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong
16
Subband effect for BSS: Gain Relative Level (dB) -50-40-30-20-1001020304050 -12 -10 -8 -6 -4 -2 0 2 4 6 Gain of BSS (nbp=100) Gain (dB) Left Right nbsb=2 nbsb=1
17
Demixing filters 20406080100120140160180200 -0.2 -0.1 0 0.1 0.2 0.3 Wlr 20406080100120140160180200 -0.2 -0.1 0 0.1 0.2 0.3 time (bin) Wrl 0102030405060708090100 0 200 300 400 500 Wlr 0102030405060708090100 0 200 300 Wrl Frequency 20406080100120 0 0.2 0.4 0.6 0.8 1 0102030405060708090100 0 200 300 400 Wlr 0102030405060708090100 0 50 100 150 200 250 Wrl Frequency nbsb=1
18
Coherence spectrograms NBP=10 Mean(Coh)=0.65 Time Frequency 0500100015002000250030003500 0 0.2 0.4 0.6 0.8 1 left Time Frequency 0500100015002000250030003500 0 0.2 0.4 0.6 0.8 1 right Frames of 256 bins with half overlap Yl(n), Yl(n+1) Yr(n), Yr(n+1)
19
Effect of nbp: Coherence spectrograms 10 100 3 NBP=3 NBP=10 NBP=100 LeftRightCoh 0.60 0.65 0.68
20
Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition ( ) and for "BSS ori" data ( ). Coherence statistic
21
Summary results Left Right CASABSS … Hearing REF mean Left Right
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.