July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Speech Recognition with Hidden Markov Models Winter 2011

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

Supervised Learning Recap

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

What is Statistical Modeling

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.

Speaker Adaptation for Vowel Classification

Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.

2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.

Introduction to Automatic Speech Recognition

Eng. Shady Yehia El-Mashad

PCA & LDA for Face Recognition

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.

Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Speaker Recognition By Afshan Hina.

June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.

VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.

Performance Comparison of Speaker and Emotion Recognition

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Predicting Voice Elicited Emotions

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.

Statistical Models for Automatic Speech Recognition

Statistical Models for Automatic Speech Recognition

Sfax University, Tunisia

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.

Department of Electrical Engineering

John H.L. Hansen & Taufiq Al Babba Hasan

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Auditory Morphing Weyni Clacken

Presentation transcript:

July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

2 Outline Introduction and Motivations Age and Gender Recognition Corpora Supervised Non-negative Matrix Factorization Proposed Method Results Conclusions and Future Researches

3 Introduction Confirming the identity of individuals Biometric Characteristics  Fingerprint  Face  Iris  Hand Geometry  Ear Shape  Voice pattern  … Choosing a characteristic  Availability  Reliability

4 Motivation In many real world cases, only speech patterns are available (kidnapping, threatening calls, …) Speech patterns can include many interesting information  Gender  Age  D ialect (original or previous regions)  Membership of a particular social group  … To facilitates in identifying a criminal To narrow down the number of suspects

Goal 5 Goal: To extract different physical and psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling). Physical: 1. Gender 2. Age 3. Accent 4. … Psychological: 1. Anxiousness 2. Stress 3. Confidence 4. …

Age and Gender Recognition 6 Three approaches: I.Directly from speech signal. II.Modeling the speech generation system. III.Modeling the hearing system.

7 I.Directly from speech signal.  Different acoustic features vary with age. 1)Fundamental frequency 2)Speech rate 3)Sound pressure level 4)…  By Finding all acoustic features varying with age and their exact relation to the speaker age. Conceptually simple and computationally inexpensive x These features are affected by many other parameters, such as weight, height, voice quality, emotional condition, … Age and Gender Recognition

8 Effect of Age and Gender on speech (Fundamental frequency) [1] Age and Gender Recognition [1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315,  Age is only one of inputs affecting the speech and consequently acoustic features.  It is impossible to estimate the age without considering the rest of inputs  Perceptions of gender and age have a significant mutual impact on each other.

9 II.Modeling the speech generation system.  It is an input estimation problem. x Modeling the speech generation system of the speaker is very difficult. Age and Gender Recognition

10 Age and Gender Recognition III.Modeling the hearing system  To solve the speech recognition problem, the hearing system is modeled using Hidden Markove Models (HMMs).  Using the tools applied in speech recognition problems (HMMs). Well established. Accurate in recognizing content. x There exist a difference between the age of a speaker as perceived, and their actual age. x Computationally complex

11 Corpora Category Name Young Male Young Female Middle Male Middle Female Senior Male Senior Female Age Number of Speakers  555 speakers from the N-best evaluation corpus [1]  The corpus contains live and read commentaries, news, interviews, and reports broadcast in Belgium  Different age groups and genders [1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp , 2009.

SNMF 12  Non-negative Matrix Factorization (NMF) is a popular machine learning algorithm [1]  It is used in supervised or unsupervised modes.  Supervised NMF or SNMF is a pattern recognition method [1] It is very effective in the case of high dimension input space. It is a generative classifier. It can directly classify patterns into multiple classes (no need to change the problem into multiple binary classification). [1] H. Van hamme, In proc. Interspeech, Australia, pp , 2008.

13 Problem Statement: Given a training data-set: S tr = {(x 1, y 1 ),..., (x n, y n ),..., (x N, y N )} x n is a vector of observed characteristics for the data item y n denotes a label vector which represents the class that x n belongs to Goal: Approximation of a classifier function (g), such that ŷ=g(x tst ) is as close as possible to the true label. x tst is an unseen observation SNMF

SNMF in Training Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula: 14

SNMF SNMF in Testing Phase: First step: Second step: Extended Kullbeck-Leibler divergence: Multiplicative updating formula: 15

Proposed Method Feature selection 2. Acoustic modeling 3. Supervector making procedure 4. Training phase 5. Testing phase

Proposed Method Feature selection MEL Spectra Mean normalization vocal tract length normalization Augmented with their first and second order time derivatives. Speech Signal Feature selection Feature Vectors ….

Proposed Method Acoustic modeling Speaker independent Model: An HMM with a shared pool of Gaussians to model the observations in 3873 cross-word context-dependent tied triphone states. Adaptation Method: The speaker dependent mixture weights for each speaker result from a re-estimation of the speaker independent weights based on a forced alignment of the training data for that speaker using a speaker-independent acoustic model. The result of this step is 555 speaker adapted models Speaker Independent Model Speaker Adaptation Method Model of the Speaker

Proposed Method Supervector making procedure Gaussian Mixture Model (GMM) of each speaker adapted HMMs is: Three type of supervectors: 1.Means 2.Variances 3.Weights Weights supervectors: The result of this step is 555 supervectors for each of 555 speakers

Proposed Method Training phase 5. Testing phase

Results 21 Evaluation Methodology 5-fold cross-validation (five independent run) In each of five run:  Training set is speech data of 444 speakers  Testing set is speech data of 111 speakers TSTTR Database TRTSTTR Database Run 1 Run 2

Results 22 Gender recognition is 96%. relative confusion matrix Age group recognition CL AC YMYFMMMFSMSF YM YF MM MF SM SF Category Name Young MaleYoung FemaleMiddle Male Middle Female Senior MaleSenior Female Prior Accuracy

Conclusions and Future Researches 23 Conclusions: 1.A new age-gender recognition method based on SNMF 2.Supervectors of GMM weights were used 3.Evaluated on N-Best Corpus 4.Gender recognition accuracy is 96% 5.Age group recognition accuracy is significantly higher than chance level Future Researches: 1.Age estimation instead of age group recognition. 2.Using supervectors of GMM means and variances and combining these features

Thank You for Your Attention 24