Download presentation
Presentation is loading. Please wait.
Published byMaud Goodman Modified over 9 years ago
1
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 27-01-2011 DKE Maastricht University
2
Speaker recognition Speech samples Voice activity detection Feature extraction Speaker recognition Multi speaker recognition Experiments and results Discussion Conclusion Contents 2 Project 3.1 DKE - Maastricht University
3
Speech contains several layers of info Spoken words Speaker identity Speaker-related differences are a combination of anatomical differences and learned speaking habits 3 Speaker Recognition Project 3.1 DKE - Maastricht University
4
Self recorded database 55 sentences from 11 different people 2x2 predefined and 1 random Pro recording and build-in laptop microphone Database via Voxforge.org 610 sentences from 61 different people Varying recording microphones and environments 4 Speech samples Project 3.1 DKE - Maastricht University
5
Power-based Entropy-based Long term spectral divergence Frames Initial frames are noise Hangover 5 Voice activity detection Adaptive noise estimation Project 3.1 DKE - Maastricht University
6
Power-based Assumes that the noise is normally distributed Calculate mean, standard deviation For each sample n Calculate For each frame j The majority of the samples 6 Voice activity detection Project 3.1 DKE - Maastricht University
7
Entropy-based Scale DFT coefficients Entropy equals 7 Voice activity detection Project 3.1 DKE - Maastricht University
8
Long term spectral divergence L-frame window Estimation Divergence 8 Voice activity detection Project 3.1 DKE - Maastricht University
9
Long term spectral divergence Estimate the noise spectrum Averages of the DFT coefficients Calculate mean (μ) LTSD of noise frames For each frame f Calculate the LTSD > c μ Update 9 Voice activity detection Project 3.1 DKE - Maastricht University
10
Representation of speakers Mel frequency cepstral coefficients Imitates human hearing Linear predictive coding Linear function of previous samples 10 Feature extraction Project 3.1 DKE - Maastricht University
11
Hamming window FFT Mel-scale Log FFT 11 MFCC Project 3.1 DKE - Maastricht University
12
P th order linear function estimated 12 LPC Project 3.1 DKE - Maastricht University
13
Nearest Neighbor Euclidean distance Neural Network Multilayer perceptron 13 Speaker recognition Project 3.1 DKE - Maastricht University
14
Features compared pairwise 14 Nearest neighbor Project 3.1 DKE - Maastricht University
15
15 Neural network Project 3.1 DKE - Maastricht University
16
Preprocessing using VAD Consecutive speech frames Single speaker recognition per segment 16 Multi speaker recognition Project 3.1 DKE - Maastricht University
17
Hand labeled samples Percentage of correct classified False Negatives 17 Experiments VAD Project 3.1 DKE - Maastricht University
18
Entropy-based Correctly classified:65,3% False negatives:9,3% Power-based Correctly classified:76,3% False negatives:6,2% Long term spectral divergence Correctly classified: 79,0% False negatives:1,6% 18 Results VAD Project 3.1 DKE - Maastricht University
19
Nr. of coefficients MFCC Optimal: 10 90.9% LPC Optimal: 8 77.3% 19 Experiments Feature extraction Project 3.1 DKE - Maastricht University
20
Professional vs. Build-in laptop microphone Silence removal Project 3.1 DKE - Maastricht University 20 Experiments single speaker recognition TrainedTestedNeural networkNearest neighbor Pro 90.9%100% Laptop 61.1%94.4% ProLaptop16.7%33.3% LaptopPro9.4%21.4%
21
Optimal number of nodes Self recorded database: 25 nodes Voxforge database: 100 nodes Project 3.1 DKE - Maastricht University 21 Experiments neural network
22
Cycles Project 3.1 DKE - Maastricht University 22 Experiments neural network
23
Self-made samples Optimal settings used Neural network: 66.7% Nearest neighbor: 76.5% Project 3.1 DKE - Maastricht University 23 Experiments multi speaker recognition
24
Nearest neighbor better than neural network? Neural network better applicable VAD gives no improvement 24 Discussion Project 3.1 DKE - Maastricht University
25
LTSD is the best VAD method MFCC outperforms LPC Training and testing with different microphones gives significant less accuracy Nearest neighbor works better than an optimized neural network 25 Conclusions Project 3.1 DKE - Maastricht University
26
26 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.