Advisor: Prof. Tony Jebara

Slides:



Advertisements
Similar presentations
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Speaker Adaptation for Vowel Classification
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
Gaussian Mixture-Sound Field Landmark Model for Robot Localization Talker: Prof. Jwu-Sheng Hu Department of Electrical and Control Engineering National.
Computational Analysis of USA Swimming Data Junfu Xu School of Computer Engineering and Science, Shanghai University.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Kinect Player Gender Recognition from Speech Analysis
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Health Diagnosis through Voice Analysis Sahil Loomba & Shamiek Mangipudi, Department of Electronics and Electrical Engineering, IIT Guwahati Deepest appreciation.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Multimodal Information Analysis for Emotion Recognition
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
SVMs for (x) Recognition (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Classification using Co-Training
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
EE368 Final Project Spring 2003
Antoine Guitton, Geophysics Department, CSM
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Project 4: Facial Image Analysis with Support Vector Machines
Can Computer Algorithms Guess Your Age and Gender?
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Statistical Models for Automatic Speech Recognition
Sfax University, Tunisia
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
feature extraction methods for EEG EVENT DETECTION
John H.L. Hansen & Taufiq Al Babba Hasan
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presentation transcript:

Advisor: Prof. Tony Jebara Voice-Based Gender Classification Using Support Vector Machine Project Presentation for Class COMS E6772, Fall 2006 Student: Wenwei Wang Advisor: Prof. Tony Jebara Columbia University December 11, 2006

Columbia University, Electrical Engineering Department 2 Motivation Gender classification plays an important role in: Speech/Speaker recognition Other applications, such as HCI, passive surveillance and smart living environmental Bi-model gender classification can improve the overall performance: Image based gender classification performance varies with the factors, such as the environment light and face angle; Voice based gender classification can be degraded by the factors, such as the environment noise and recording channels. About this project Focus on the voice based gender classification using Support Vector Machines. Gaussian Mixture Model method were used as a comparison. Both cases of text dependent and text independent were explored. Columbia University, Electrical Engineering Department 2

Columbia University, Electrical Engineering Department 3 Voice Data Source Train and Test Voices: Each of 25 speakers were asked to read two different paragraphs, the longer one for training voice and the shorter one for testing voice; Recording Method: Different offices with the normal level of noises during the working hours; Ordinary telephone microphone; Microsoft Sound Recorder, Version 5.1; 16 bit, 16 KHz, Mono mode; Record length ranges from 40 to 60 seconds. Speakers Summary: Male: 15; Female: 10 Columbia University, Electrical Engineering Department 3

Columbia University, Electrical Engineering Department 4 Voice Feature MFCC: The most commonly used for Speech/Speaker Recognition/Verification Feature Extraction: Pre-emphasizing: H(z) = 1 - 0.95 Z -1 Framing: window size=500 samples; overlap=200 samples; Filtering: hamming window; Training Voice: 800 MFCC vectors with order of 12 per speaker; Testing Voice: 400 MFCC vectors with order of 12 per speaker; On the top of each MFCC vector, the delta MFCC vector, and the delta delta MFCC delta were created (they were experimented for better results, besides MFCC, but results showed no improvement for gender classification). A true gender matrix with the value 1 for male or -1 for female were created for each of MFCC vectors. Columbia University, Electrical Engineering Department 4

Columbia University, Electrical Engineering Department 5 SVC Implementation SVC Model Training 100 frames of training MFCC per speaker; (100 frames yielded the best results based on the overall classification performance) For each MFCC vector, only the first 3 coefficients were used; (adding more coefficients, or using other combinations with delta MFCC, and delta delta MFCC coefficients didn’t improve the overall classification performance) 1st frame from each speaker generated 1st SVC model, then 2nd frame from each speaker generated 2nd SVC model, and so on. 100 frames generated 100 SVC models; Kernel: RFB with sigma =0.1 and cost =inf used (RFB, ERBF, and BSPLINE gave the same good model with 100% gender classification, and for RFB, the value of sigma =0.1 and cost =inf were selected based on the overall classification performance) SVC Classification Text independent: 100 frames of testing MFCC per speaker; (100 frames and 3 coefficients yielded the best results) Text dependent: 100 frames of training MFCC per speaker; (100 frames and 3 coefficients yielded the best results) 100 predicted gender data from 100 SVC models were simply averaged as the final gender score. SVC tool: SVM software written by Dr. Steve Gunn Columbia University, Electrical Engineering Department 5

Columbia University, Electrical Engineering Department 6 SVC Model SVC Model Performance SVC plot for 1st 2 dimensions of training MFCC features Top: for frame 1 from 25 speakers Blue: MALE; Red: FEMALE. Bottom: for frame 100 from 25 speakers Blue: MALE; Red: FEMALE As examined one by one, all 100 frames are classified 100% accurate. Only two frames are shown here as examples. Overall, the SVC model is super in its accuracy! Columbia University, Electrical Engineering Department 6

SVC Text Independent Classification 25 test voices used MFCC: 100 frames per voice; 21 voices detected correctly; 3 male voices detected as female, and 1 female voice detected as male; Overall Gender Detection Accuracy Rate: 84% Test 1 Voice 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Detected m m f Label m Columbia University, Electrical Engineering Department 7

SVC Text Dependent Classification 25 train voices used MFCC: 100 frames per voice; All 25 voices detected correctly; Overall Gender Detection Accuracy Rate: 100% Test 1 Voice 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Detected m m f Label m Columbia University, Electrical Engineering Department 8

Columbia University, Electrical Engineering Department 9 GMM Implementation GMM Model Training 25 x 800 frames of training MFCC were divided into two groups, male or female; For each MFCC vector, only the first 2 coefficients were used; (adding more coefficients, or using other combinations with delta MFCC, and delta delta MFCC coefficients didn’t improve the overall classification performance) Male GMM model and Female GMM model were trained from two MFCC groups; GMM parameters: 2 dimensions, 5 mixtures, diag, 20 EM iterations (selected based on overall classification results) GMM Classification Text independent: all 400 frames of testing MFCC with 1st 2 coefficients per speaker; Text dependent: all 800 frames of training MFCC with 1st 2 coefficients per speaker; Each frame fed into the Male GMM model and Female GMM model, respectively. The ratio of two resulted values decides the gender; 400 or 800 predicted gender data were simply averaged as the final gender score. GMM tool: Netlab software written by Dr. Ian Nabney and Dr. Christopher Bishop. Columbia University, Electrical Engineering Department 9

Columbia University, Electrical Engineering Department 10 GMM Model GMM Model Performance PDF plots (top) of 1st 2 dimensions of MFCC features Left: MALE; Right: FEMALE. The combined PDF plots in 3D (bottom) for the 1st 2 dimensions of MFCC features based on GMM Red: MALE; Blue: FEMALE. The Gaussian peaks clearly show the differences between male and female, but Gaussian bodies show the overlaps. Overall the GMM model is NOT as good as the SVC model! Columbia University, Electrical Engineering Department 10

GMM Text Independent Classification 25 test voices used MFCC: 400 frames per voice;. 21 voices detected correctly; 3 male voices detected as female, and 1 female voice detected as male; Overall Gender Detection Accuracy Rate: 84% Test 1 Voice 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Detected m m f Label m Columbia University, Electrical Engineering Department 11

GMM Text Dependent Classification 25 train voices used MFCC: 800 frames per voice; 22 voices detected correctly; 2 male voices detected as female, and 1 female voice detected as male; Overall Gender Detection Accuracy Rate: 88% Test 1 Voice 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Detected m m f Label m Columbia University, Electrical Engineering Department 12

Columbia University, Electrical Engineering Department 13 Result Summary Results summary: SVC GMM Model Super model with 100% accuracy PDF peak clearly separated; bodies are overlapped in some degree. Text Independent Classification 84% accuracy Text Dependent Classification 100% accuracy 86% accuracy Columbia University, Electrical Engineering Department 13

Columbia University, Electrical Engineering Department 14 Conclusion SVC model itself is a super accurate model, and hence has more potentials than the GMM model in the voice-based gender classification, and possibly in other classification applications; For text dependent type of classification, the SVC could be the best choice; For text independent type of classification, the SVC is one of the choices. Columbia University, Electrical Engineering Department 14

Columbia University, Electrical Engineering Department 15 Future Work Investigate the reasons why such a super SVC model can’t perform well for the text independent gender classification; Explore the possible voice features which might improve the SVC text independent classification performance; It could be meaningful to compare SVC performance with other classification model, such as HMM and NNW; Examine SVC model for other voice based classification applications, such as age and spoken language. Columbia University, Electrical Engineering Department 15

Columbia University, Electrical Engineering Department 16 References Steve R. Gunn, ‘Support Vector Machines for Classification and Regression,’ Technical report, University of Southampton, 1998. W.M.Campbell, J.P.Campbell, T.P. Gleason, D.A. Reynolds, and T.R.Leek,’High-Level Speaker Verification With Support Vector Machines,’ ICASSP, 2004. And others ( will be listed in the final report) Columbia University, Electrical Engineering Department 16