Presentation is loading. Please wait.

Presentation is loading. Please wait.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University.

Similar presentations


Presentation on theme: "Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University."— Presentation transcript:

1 Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University

2  Speaker recognition  Speech samples  Voice activity detection  Feature extraction  Speaker recognition  Multi speaker recognition  Experiments and results  Discussion  Conclusion Contents 2 Project 3.1 DKE - Maastricht University

3  Speech contains several layers of info  Spoken words  Speaker identity  Speaker-related differences are a combination of anatomical differences and learned speaking habits 3 Speaker Recognition Project 3.1 DKE - Maastricht University

4  Self recorded database  55 sentences from 11 different people  2x2 predefined and 1 random  Pro recording and build-in laptop microphone  Database via Voxforge.org  610 sentences from 61 different people  Varying recording microphones and environments 4 Speech samples Project 3.1 DKE - Maastricht University

5  Power-based  Entropy-based  Long term spectral divergence  Frames  Initial frames are noise  Hangover 5 Voice activity detection Adaptive noise estimation Project 3.1 DKE - Maastricht University

6  Power-based  Assumes that the noise is normally distributed  Calculate mean, standard deviation  For each sample n  Calculate  For each frame j  The majority of the samples 6 Voice activity detection Project 3.1 DKE - Maastricht University

7  Entropy-based  Scale DFT coefficients  Entropy equals 7 Voice activity detection Project 3.1 DKE - Maastricht University

8  Long term spectral divergence  L-frame window  Estimation  Divergence 8 Voice activity detection Project 3.1 DKE - Maastricht University

9  Long term spectral divergence  Estimate the noise spectrum  Averages of the DFT coefficients  Calculate mean (μ) LTSD of noise frames  For each frame f  Calculate the LTSD > c μ  Update 9 Voice activity detection Project 3.1 DKE - Maastricht University

10  Representation of speakers  Mel frequency cepstral coefficients  Imitates human hearing  Linear predictive coding  Linear function of previous samples 10 Feature extraction Project 3.1 DKE - Maastricht University

11  Hamming window  FFT  Mel-scale  Log  FFT 11 MFCC Project 3.1 DKE - Maastricht University

12  P th order linear function estimated  12 LPC Project 3.1 DKE - Maastricht University

13  Nearest Neighbor  Euclidean distance  Neural Network  Multilayer perceptron 13 Speaker recognition Project 3.1 DKE - Maastricht University

14  Features compared pairwise  14 Nearest neighbor Project 3.1 DKE - Maastricht University

15 15 Neural network Project 3.1 DKE - Maastricht University

16  Preprocessing using VAD  Consecutive speech frames  Single speaker recognition per segment 16 Multi speaker recognition Project 3.1 DKE - Maastricht University

17  Hand labeled samples  Percentage of correct classified  False Negatives 17 Experiments VAD Project 3.1 DKE - Maastricht University

18  Entropy-based  Correctly classified:65,3%  False negatives:9,3%  Power-based  Correctly classified:76,3%  False negatives:6,2%  Long term spectral divergence  Correctly classified: 79,0%  False negatives:1,6% 18 Results VAD Project 3.1 DKE - Maastricht University

19  Nr. of coefficients  MFCC  Optimal: 10  90.9%  LPC  Optimal: 8  77.3% 19 Experiments Feature extraction Project 3.1 DKE - Maastricht University

20  Professional vs. Build-in laptop microphone  Silence removal Project 3.1 DKE - Maastricht University 20 Experiments single speaker recognition TrainedTestedNeural networkNearest neighbor Pro 90.9%100% Laptop 61.1%94.4% ProLaptop16.7%33.3% LaptopPro9.4%21.4%

21  Optimal number of nodes  Self recorded database: 25 nodes  Voxforge database: 100 nodes Project 3.1 DKE - Maastricht University 21 Experiments neural network

22  Cycles Project 3.1 DKE - Maastricht University 22 Experiments neural network

23  Self-made samples  Optimal settings used  Neural network: 66.7%  Nearest neighbor: 76.5% Project 3.1 DKE - Maastricht University 23 Experiments multi speaker recognition

24  Nearest neighbor better than neural network?  Neural network better applicable  VAD gives no improvement 24 Discussion Project 3.1 DKE - Maastricht University

25  LTSD is the best VAD method  MFCC outperforms LPC  Training and testing with different microphones gives significant less accuracy  Nearest neighbor works better than an optimized neural network 25 Conclusions Project 3.1 DKE - Maastricht University

26 26 Questions?


Download ppt "Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University."

Similar presentations


Ads by Google