Presentation is loading. Please wait.

Presentation is loading. Please wait.

A maximum likelihood estimation and training on the fly approach

Similar presentations


Presentation on theme: "A maximum likelihood estimation and training on the fly approach"— Presentation transcript:

1 A maximum likelihood estimation and training on the fly approach
ROBUST SPEAKER VERIFICATION IN REVERBERANT CONDITIONS USING ESTIMATED ACOUSTIC PARAMETERS A maximum likelihood estimation and training on the fly approach By Khamis A. Al-Karawi & Francis Li

2 Presentation Outline Speaker recognition definition
Main application of speaker recognition Speaker recognition challenges The study aim. MLE and Training on the fly method. Result of experiments. Conclusion.

3 Speaker recognition : refer to identify the speaker from his/her voice.
Identification Speaker Verification Speaker Recognition Is the process of determining the identity of the talker that produced speech from group of speakers Is the process of accepting or rejecting the identity claim of the speaker

4 Speaker Recognition Applications
Security and access applications Telephone banking web services Forensic application Surveillance applications

5 Speaker recognition Challenges
Robustness of automatic speaker recognition is crucial for real-world applications. Despite advancements in the area of speaker recognition, there are still many challenges. Among these, environmental factors, such as background noise and reverberation, are known to be difficult to address. For speaker recognition, the best performance is achieved when the reverberation features of training and test speech phases are close.

6 Study Aim Why using estimated method?
Our study focused on reducing the effect of reverberation on the speech signals obtained from a single microphone and improve the robustness of speaker recognition system for practical applications. Why using estimated method? Because the removal or reduction of channel effects, or the dereverberation methods can to some extent, mitigate the mismatching problem but they added distortions to the speech signals themselves, and therefore, its effectiveness is limited. However, successful, single channel dereverberation for speaker recognition does not exist. Currently, available single channel blind dereverberation techniques reduced the effect of reverberation on speech signal to some extent, but they do not seem to improve the performance of speaker recognition, due to the distortions imposed on the d speech signals

7 Speech dataset Speech samples were collected from 110 speakers 58 males and 52 females at anechoic chamber at Salford University for reverberation were used in this experiment We used small database because our aim is not to investigated the performance of speaker recognition against talkers but the aim is to investigated it against environment.

8 Impulse Response database
Aachen database consist measured impulse response from different rooms CAT- Acoustic software used to generate impulse response from different rooms Each speaker signal convoluted with different impulse response from different place in the room to produce reverberation time

9 Training on the fly method
A maximum likelihood estimation method is proposed for blind-estimate reverberation time from the submitted speech signals. This is used to choice a closet matched one from a channel model bank. The new model including in the retraining of the pattern recognition model on the fly The selected set of RT is used in the training stage to generate a reference model for each speaker. Moreover, the Cepstral features for each speech sample are extracted in the enrollment phase Extract features vector from the reverberant speech signal (speaker signal) and then utilise it in a testing stage. Scoring verification: computes the verification score between the reference model of claimed speaker and recognition features of the input speech signal . The score measured as the log-likelihood ratio between two models. Decision-making: based on the score obtained from the scoring process and specific threshold, the decision has been made whether a recognition feature of the test speech signal belongs to the claimed speaker or not.

10

11 The objective parameters were estimated from the reverberated signal and results were compared to the values that were calculated directly from the impulse response Room Types Distance Sou- Mic. Schroeder MethodT60 RT by MLE Studio booth 0.50 0.11 0.15 1.0 0.18 0.23 1.50 0.21 0.25 Meeting Room 1.70 0.27 1.90 0.29 2.80 0.30 Office Room 1.00 0.33 0.37 2.00 0.42 0.48 3.00 0.51 0.56 Lecture 4.00 0.71 0.75 5.56 0.82 0.88 7.10 1.2 8.68 1.1 1.5 10.2 1. 7

12 Result

13 System Accuracy depend on both method

14 Detection error trade-off curve

15 Conclusion Results shown the reverberation can, to different extents, degrade the performance of recognition. The paper has presented a new method to enable to training speaker recognition system with reverberant speech samples according to the estimated reverberant conditions. This is achieved by using a maximum likelihood estimation method to determine the reverberation time. It has been shown that the proposed method, notably improves the reliability of speaker recognition in terms of equal error rate and detection error trade-off. The improvement becomes more significant when reverberation time tends to be longer.

16 THANK YOU FOR YOUR ATTENTION


Download ppt "A maximum likelihood estimation and training on the fly approach"

Similar presentations


Ads by Google