Presentation is loading. Please wait.

Presentation is loading. Please wait.

July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.

Similar presentations


Presentation on theme: "July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme."— Presentation transcript:

1 July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

2 2 Outline Introduction and Motivations Age and Gender Recognition Corpora Supervised Non-negative Matrix Factorization Proposed Method Results Conclusions and Future Researches

3 3 Introduction Confirming the identity of individuals Biometric Characteristics  Fingerprint  Face  Iris  Hand Geometry  Ear Shape  Voice pattern  … Choosing a characteristic  Availability  Reliability

4 4 Motivation In many real world cases, only speech patterns are available (kidnapping, threatening calls, …) Speech patterns can include many interesting information  Gender  Age  D ialect (original or previous regions)  Membership of a particular social group  … To facilitates in identifying a criminal To narrow down the number of suspects

5 Goal 5 Goal: To extract different physical and psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling). Physical: 1. Gender 2. Age 3. Accent 4. … Psychological: 1. Anxiousness 2. Stress 3. Confidence 4. …

6 Age and Gender Recognition 6 Three approaches: I.Directly from speech signal. II.Modeling the speech generation system. III.Modeling the hearing system.

7 7 I.Directly from speech signal.  Different acoustic features vary with age. 1)Fundamental frequency 2)Speech rate 3)Sound pressure level 4)…  By Finding all acoustic features varying with age and their exact relation to the speaker age. Conceptually simple and computationally inexpensive x These features are affected by many other parameters, such as weight, height, voice quality, emotional condition, … Age and Gender Recognition

8 8 Effect of Age and Gender on speech (Fundamental frequency) [1] Age and Gender Recognition [1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.  Age is only one of inputs affecting the speech and consequently acoustic features.  It is impossible to estimate the age without considering the rest of inputs  Perceptions of gender and age have a significant mutual impact on each other.

9 9 II.Modeling the speech generation system.  It is an input estimation problem. x Modeling the speech generation system of the speaker is very difficult. Age and Gender Recognition

10 10 Age and Gender Recognition III.Modeling the hearing system  To solve the speech recognition problem, the hearing system is modeled using Hidden Markove Models (HMMs).  Using the tools applied in speech recognition problems (HMMs). Well established. Accurate in recognizing content. x There exist a difference between the age of a speaker as perceived, and their actual age. x Computationally complex

11 11 Corpora Category Name Young Male Young Female Middle Male Middle Female Senior Male Senior Female Age18-35 36-45 46-81 Number of Speakers85531604119125  555 speakers from the N-best evaluation corpus [1]  The corpus contains live and read commentaries, news, interviews, and reports broadcast in Belgium  Different age groups and genders [1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 2571-2574, 2009.

12 SNMF 12  Non-negative Matrix Factorization (NMF) is a popular machine learning algorithm [1]  It is used in supervised or unsupervised modes.  Supervised NMF or SNMF is a pattern recognition method [1] It is very effective in the case of high dimension input space. It is a generative classifier. It can directly classify patterns into multiple classes (no need to change the problem into multiple binary classification). [1] H. Van hamme, In proc. Interspeech, Australia, pp. 2554-2557, 2008.

13 13 Problem Statement: Given a training data-set: S tr = {(x 1, y 1 ),..., (x n, y n ),..., (x N, y N )} x n is a vector of observed characteristics for the data item y n denotes a label vector which represents the class that x n belongs to Goal: Approximation of a classifier function (g), such that ŷ=g(x tst ) is as close as possible to the true label. x tst is an unseen observation SNMF

14 SNMF in Training Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula: 14

15 SNMF SNMF in Testing Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula: 15

16 Proposed Method 16 1. Feature selection 2. Acoustic modeling 3. Supervector making procedure 4. Training phase 5. Testing phase

17 Proposed Method 17 1. Feature selection MEL Spectra Mean normalization vocal tract length normalization Augmented with their first and second order time derivatives. Speech Signal Feature selection Feature Vectors ….

18 Proposed Method 18 2. Acoustic modeling Speaker independent Model: An HMM with a shared pool of 49740 Gaussians to model the observations in 3873 cross-word context-dependent tied triphone states. Adaptation Method: The speaker dependent mixture weights for each speaker result from a re-estimation of the speaker independent weights based on a forced alignment of the training data for that speaker using a speaker-independent acoustic model. The result of this step is 555 speaker adapted models Speaker Independent Model Speaker Adaptation Method Model of the Speaker

19 Proposed Method 19 3. Supervector making procedure Gaussian Mixture Model (GMM) of each speaker adapted HMMs is: Three type of supervectors: 1.Means 2.Variances 3.Weights Weights supervectors: The result of this step is 555 supervectors for each of 555 speakers

20 Proposed Method 20 4. Training phase 5. Testing phase

21 Results 21 Evaluation Methodology 5-fold cross-validation (five independent run) In each of five run:  Training set is speech data of 444 speakers  Testing set is speech data of 111 speakers TSTTR Database TRTSTTR Database...... Run 1 Run 2

22 Results 22 Gender recognition is 96%. relative confusion matrix Age group recognition CL AC YMYFMMMFSMSF YM1303580260 YF027704110570 MM06014401470 MF05402241702 SM0301190760 SF020828 16 Category Name Young MaleYoung FemaleMiddle Male Middle Female Senior MaleSenior Female Prior151029734 4 Accuracy137744247616

23 Conclusions and Future Researches 23 Conclusions: 1.A new age-gender recognition method based on SNMF 2.Supervectors of GMM weights were used 3.Evaluated on N-Best Corpus 4.Gender recognition accuracy is 96% 5.Age group recognition accuracy is significantly higher than chance level Future Researches: 1.Age estimation instead of age group recognition. 2.Using supervectors of GMM means and variances and combining these features

24 Thank You for Your Attention 24


Download ppt "July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme."

Similar presentations


Ads by Google