Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

Similar presentations


Presentation on theme: "A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA."— Presentation transcript:

1 A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA

2 C. Mokbel - UOB - NIST20022 Outline Introduction Baseline speaker recognition system NIST 2002 evaluation Conclusion and perspective

3 C. Mokbel - UOB - NIST20023 Introduction A baseline system has been built and was used in the NIST 2002 speaker recognition evaluation –GMM based system –Normalization using z-norm –Adaptation technique used to estimate speaker model starting from world model

4 C. Mokbel - UOB - NIST20024 Baseline Speaker Recognition System Feature extraction: –Speech recognition based feature vectors 13 MFCC coefficients including the energy on logarithmic scale + first and second order derivative –Leading to 39 feature parameters Preprocessing using cepstral mean normalization

5 C. Mokbel - UOB - NIST20025 Baseline Speaker Recognition System GMM modeling for both hypotheses: speaker and non speaker (world) –EM algorithm to train the world model (Baum- Welch) Initialization using LBG VQ –Speaker model: adapted mean vectors from the world model Approximation of the “unified adaptation approach” (“Online Adaptation of HMMs to Real-Life Conditions: A Unified Framework”, IEEE Trans. on SAP Vol. 9, n 4, may 2001) IEEE Trans. on SAP Vol. 9, n 4, may 2001)

6 C. Mokbel - UOB - NIST20026 Baseline Speaker Recognition System Speaker Adaptation: –World model Gaussian distributions grouped in a binary tree –Speaker data driven determination of the Gaussian classes –MLLR applied based on these classes: only means of Gaussian distributions are adapted –MAP applied to the leaves Gaussian distributions

7 C. Mokbel - UOB - NIST20027 Baseline Speaker Recognition System Building the Gaussian tree bottom up: –Grouping two by two the closest Gaussian distributions –Distance between 2 Gaussian distributions is equal to the loss in the likelihood of the associated data if the two Gaussian are merged in a unique Gaussian

8 C. Mokbel - UOB - NIST20028 Baseline Speaker Recognition System After the E-step of the EM algorithm the weights associated to the leaves of the tree are propagated through the tree up to the root Going from the root to the leaves, nodes are selected whenever one of their two children has a weight less than a threshold –This defines a partition that will be used in an MLLR algorithm

9 C. Mokbel - UOB - NIST20029 Baseline Speaker Recognition System MAP algorithm: –Estimated Gaussian means parameters at the leaves are smoothed using a fixed weight with the parameters of the world Gaussian

10 C. Mokbel - UOB - NIST200210 Baseline Speaker Recognition System Given a target speaker model s, the world model w and a test utterance X, the score for this utterance is computed as the log likelihood ratio: s = log [p(X/ s ) / p(X/ w )] This score should be normalized due to the fact that the world model is not precise

11 C. Mokbel - UOB - NIST200211 Baseline Speaker Recognition System Normalization using the z-norm: –Few impostors utterances are used –A score is computed for every utterance –The different scores define a distribution per target speaker –Target speakers distributions should be similar for a decision using a unique threshold Reduce and center the distribution ns = a * s + b

12 C. Mokbel - UOB - NIST200212 Baseline Speaker Recognition System Based on the data from the 2001 evaluation a DET curve can be plotted –Find the optimal decision threshold that minimize the cost defined by NIST’2002, i.e.: C det = C mis *Pr miss/target *Pr target + C FalseAlarm *Pr FalseAlarm/NonTarget *(1-Pr target )

13 C. Mokbel - UOB - NIST200213 NIST 2002 evaluation Feature vector: 13 MFCCs + 13  + 13  2 Cepstral Mean Normalization Gender dependent GMM with 256 Gaussian mixtures for world model –Trained on a subset of the cellular data of NIST 2001 evaluation

14 C. Mokbel - UOB - NIST200214 NIST 2002 evaluation Target speaker model adapted from world model –For every iteration and after the E step Threshold (cumulative probability = 3.0) to select tree nodes MLLR used to update the Gaussian means Approximated MAP to smooth the MLLR estimated parameters: linear combination between the MLLR estimated mean (0.8) and the world (a priori) mean (0.2)

15 C. Mokbel - UOB - NIST200215 NIST 2002 evaluation 16 male and 21 female speakers (NIST 2001) used as impostors (~8 test files from each) –The pseudo-impostors scores define a distribution used to z-normalize the score for a given target speaker Global threshold estimated on NIST 2001 data in order to minimize the cost

16 C. Mokbel - UOB - NIST200216 NIST 2002 evaluation System characteristics: –CPU time on a pentium III 800 MHz: 2.1 ms per frame and per speaker for speaker model adaptation 0.92 ms per frame for the test –Memory usage: ~360 Kbytes per test

17 C. Mokbel - UOB - NIST200217 NIST 2002 evaluation Results: –C det = 0.100292 –Min C det = 0.097833 DET Curve:

18 C. Mokbel - UOB - NIST200218 NIST 2002 evaluation

19 C. Mokbel - UOB - NIST200219 NIST 2002 evaluation

20 C. Mokbel - UOB - NIST200220 NIST 2002 evaluation

21 C. Mokbel - UOB - NIST200221 Conclusions and perspectives A new baseline system has been developed and evaluated A lot of work to be done, mainly: –Optimize the feature extraction module –Implement the complete Unified Adaptation approach –Investigate new normalization strategies –Integrate automatic labeling of speech segments


Download ppt "A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA."

Similar presentations


Ads by Google