An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

1 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Patrol Team Language Identification System for DARPA RATS P1 Evaluation Pavel Matejka 1,
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Introduction to Data Mining Engineering Group in ACL.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Gender and 3D Facial Symmetry: What’s the Relationship ? Xia BAIQIANG (University Lille1/LIFL) Boulbaba Ben Amor (TELECOM Lille1/LIFL) Hassen Drira (TELECOM.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
PROSODY MODELING AND EIGEN- PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION Zi-He Chen, Yuan-Fu Liao, and Yau-Tarng Juang ICASSP 2005 Presenter: Fang-Hui.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Video Tracking Using Learned Hierarchical Features
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Predicting Voice Elicited Emotions
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Research on Machine Learning and Deep Learning
Guillaume-Alexandre Bilodeau
Intro to Machine Learning
Can Computer Algorithms Guess Your Age and Gender?
Deep Neural Networks based Text- Dependent Speaker Verification
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
John H.L. Hansen & Taufiq Al Babba Hasan
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu
Presentation transcript:

An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen {Shivesh.Ranjan, Gang.Liu, Why female and male speech differ? Why female and male speech differ? Vocal Tract Length (14cm vs 17.5cm). Length of vocal folds (ratio of vocal fold lengths is 0.8). Larynx Anatomy (difference in thickness). Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science The University of Texas at Dallas Richardson, Texas , USA Applications of Gender Identification Applications of Gender Identification Improving speech & speaker recognition accuracy. Accent identification, Speaker health identification. Emotion Recognition, Surveillance, Call center-business applications, Human computer intelligent interaction. Motivations for i-Vector based Gender ID approach Motivations for i-Vector based Gender ID approach i-Vector offers a compact representation of an utterance while preserving the speaker-specific attributes. Gender is an important speaker specific attribute. i-Vector based systems are the current state-of-the-art in Speaker ID and Language ID. GMM-UBM based Gender ID systems. Gender ID framework First 2 dimensions of MMI based 3-D projection of 2600 i-vectors from the FE test-set. Fundamentals of i-Vector G-PLDA framework Fundamentals of i-Vector G-PLDA framework Gender Separability in the i-Vector Space Training and Test data-sets Fisher English (FE)Training Data 20,652 gender-labeled FE utterances (89% of the total corpus) was used to train the UBM, and the T matrix for i-Vector extraction. Fisher English (FE) Test Data 2,600 utterances selected randomly from the FE corpus (11% of the total corpus). Smaller test- sets of duration 20s, 10s, and 3s were also created. DARPA RATS Test Data 438 test-utterances from the different channels (A, B, C, D, E, F, G, H) and the clean (SRC) source, and in 5 different languages. DARPA RATS Unlabeled Development Set 502 utterances per channel for all the channels except H. 480 utterances for channel H. Results on FE data Duration Mismatch Compensation Retrain the gender ID system with corresponding shorter-duration segments. Unsupervised Domain Adaptation Issues with the RATS test-set Gender ID system is trained only on FE data, and no gender-labeled data is available for the RATS test-set. 4 of the 5 languages are not present in the FE training-set. Unsupervised Clustering Use unsupervised clustering (Label Generating-Max Margin Clustering) to assign labels to unlabeled RATS development data. Estimate the in-domain PLDA model using the estimated labels. Out-of-domain PLDA model adaptation Gender ID results on RATS data i-Vector based Gender ID: Conclusions On FE test-sets, the proposed approach is able to achieve accuracy and EER of up to 97.62% and 2.31% respectively. Duration mismatch compensation offers significantly smaller degradation in performance for shorter duration test segments. On RATS test-set, unsupervised domain adaptation strategy offered a 6.8% relative gain (5.25% absolute) in classification accuracy, and a 14.75% relative reduction (3.08% absolute) in EER.