Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Known Non-targets for PLDA-SVM Training/Scoring Construction of Discriminative Kernels from Known and Unknown Non-targets for PLDA-SVM Scoring Results.
1 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Patrol Team Language Identification System for DARPA RATS P1 Evaluation Pavel Matejka 1,
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Speech Recognition with Hidden Markov Models Winter 2011
Brno University Of Technology Lukáš Burget, Michal Fapšo, Valiantsina Hubeika, Ondřej Glembek, Martin Karafiát, Marcel Kockmann, Pavel Matějka,
Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification
A Text-Independent Speaker Recognition System
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
Principal Component Analysis
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Adaptation for Vowel Classification
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Optimal Adaptation for Statistical Classifiers Xiao Li.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Tous droits réservés © 2005 CRIM The CRIM Systems for the NIST 2008 SRE Patrick Kenny, Najim Dehak and Pierre Ouellet Centre de recherche informatique.
Advisor: Prof. Tony Jebara
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Recent work on Language Identification
Presented By Wanchen Lu 2/25/2013
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Institute of Information Science, Academia Sinica, Taiwan Speaker Verification via Kernel Methods Speaker : Yi-Hsiang Chao Advisor : Hsin-Min Wang.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Power Linear Discriminant Analysis (PLDA) M. Sakai, N. Kitaoka and S. Nakagawa, “Generalization of Linear Discriminant Analysis Used in Segmental Unit.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Chapter 13 (Prototype Methods and Nearest-Neighbors )
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
Decision Making Based on Cohort Scores for
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presentation transcript:

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai Aronowitz IBM Haifa Research Lab Presentation is available online at: Intra-Class Variability Modeling for Speech Processing

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 2 Given labeled training segments from class + and class –, classify unlabeled test segments Classification framework 1. Represent speech segments in segment-space 2. Learn a classifier in segment-space SVMs NNs Bayesian classifiers … Speech Classification Proposed framework

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 3 Outline Intra-Class Variability Modeling for Speech Processing 1 Introduction to GMM based classification 2 Mapping speech segments into segment space 3 Intra-class variability modeling 4 Speaker diarization 5 Summary

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 4 GMM based speaker recognition Estimate Pr(y t |S) 1. Train a universal background model (UBM) GMM using EM 2. For every target speaker S: Train a GMM G S by applying MAP-adaptation Text-Independent Speaker Recognition GMM-Based Algorithm [Reynolds 1995] Assuming frame independence: UBM Q 1 - speaker #1 Q 2 - speaker #2 μ1μ1 μ2μ2 μ3μ3 R 26 MFCC feature space

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Invalid frame independence assumption: Factors such as channel, emotion, lexical variability, and speaker aging cause frame dependency 2.GMM scoring is inefficient – linear in the length of the audio 3.GMM scoring does not support indexing GMM Based Algorithm - Analysis

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 6 Outline Intra-Class Variability Modeling for Speech Processing 1 Introduction to GMM based classification 2 Mapping speech segments into segment space 3 Intra-class variability modeling 4 Speaker diarization 5 Summary

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 7 Mapping Speech Segments into Segment Space GMM scoring approximation 1/4 Definitions X:training session for target speaker Y:test session Q:GMM trained for X P:GMM trained for Y Goal Compute Pr(Y |Q) using GMMs P and Q only Motivation 1. Efficient speaker recognition and indexing 2. More accurate modeling

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 8 (1) Negative cross entropy Mapping Speech Segments into Segment Space GMM scoring approximation 2/4 Approximating the cross entropy between two GMMs 1.Matching based lower bound [Aronowitz 2004] 2.Unscented-transform based approximation [Goldberger & Aronowitz 2005] 3.Others options in [Hershey 2007]

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 9 (2) Matching based approximation Mapping Speech Segments into Segment Space GMM scoring approximation 3/4 Assuming weights and covariance matrices are speaker independent (+ some approximations): (3) Mapping T is induced: (4)

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Results Mapping Speech Segments into Segment Space GMM scoring approximation 4/4 Figure and Table taken from: H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September 2007.

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Anchor modeling projection [Sturim 2001] efficient but inaccurate 2.MLLR transofrms [Stolcke 2005] accurate but inefficient 3.Kernel-PCA-based mapping [Aronowitz 2007c] Given - a set of objects - a kernel function (a dot product between each pair of objects) Finds a mapping of the objects into R n which preserves the kernel function. accurate & efficient Other Mapping Techniques

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Session space Feature space x f(x) Tx Common speaker subspace (R n) y f(y) Ty uyuy uxux Speaker unique subspace K-PCA Anchor sessions Kernel-PCA Based Mapping Kernel induced

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction Mapping Modeling Speaker Diarization Summary Outline Intra-Class Variability Modeling for Speech Processing 1 Introduction to GMM based classification 2 Mapping speech segments into segment space 3 Intra-class variability modeling 4 Speaker diarization 5 Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction Mapping Modeling Speaker Diarization Summary The classic GMM algorithm does not explicitly model intra-speaker inter-session variability: channel, noise language stress, emotion, aging The frame independence assumption does not hold in these cases! (1) (3) Instead, we can use a more relaxed assumption: Intra-Class Variability Modeling [Aronowitz 2005b] Introduction (2) which leads to:

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction Mapping Modeling Speaker Diarization Summary Speaker Framesequence generated independently a GMM Old vs. New Generative Models Session GMM Framesequence Speaker a PDF over GMM space a GMM generated independently Old Model New Model

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction Mapping Modeling Speaker Diarization Summary speaker #1 speaker #2 speaker #3 Session-GMM Space Session-GMM space GMM for session A of speaker #1 GMM for session B of speaker #1

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Modeling in Session-GMM space 1/2 Recall mapping T induced by the GMM approximation analysis: is called a supervector A speaker is modeled by a multivariate normal distribution in supervector space: (3) A typical dimension of is 50,000*50,000 is estimated robustly using PCA + regularization: Covariance is assumed to be a low rank matrix with an additional non-zero (noise) diagonal

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction Mapping Modeling Speaker Diarization Summary Supervector space speaker #1 speaker #2 speaker #3 Delta supervector space Modeling in Session-GMM Space 2/2 Estimating covariance matrix

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June is estimated from the NIST-2006-SRE corpus Evaluation is done on the NIST-2004-SRE corpus ETSI MFCC (13-cep + 13-delta-cep) Energy based voice activity detector Feature warping 2048 Gaussians Target models are adapted from GI-UBM ZT-norm score normalization Experimental Setup Datasets System description

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Results 38% reduction in EER

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June NAP+SVMs [Campbell 2006] Factor Analysis [Kenny 2005] Kernel-PCA [Aronowitz 2007c] Model each supervector as s S : Common speaker subspace u U : Speaker unique subspace S is spanned by a set of development supervectors (700 speakers) U is the orthogonal complement of S in supervector space Intra-speaker variability is modeled separately in S and in U U was found to be more discriminative than S EER was reduced by 44% compared to baseline GMM Other Modeling Techniques Kernel-PCA based algorithm

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Session space Feature space x f(x) Tx Common speaker subspace (R n) y f(y) Ty uyuy uxux Speaker unique subspace K-PCA Anchor sessions Kernel-PCA Based Modeling Kernel induced

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Outline Intra-Class Variability Modeling for Speech Processing 1 Introduction to GMM based classification 2 Mapping speech segments into segment space 3 Intra-class variability modeling 4 Speaker diarization 5 Summary

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Goals Detect speaker changes – “speaker segmentation” Cluster speaker segments - “speaker clustering” Motivation for new method Current algorithms do not exploit available training data! (besides tuning thresholds, etc.) Method Explicitly model inter-segment intra-speaker variability from labeled training data, and use for the metric used by change-detection / clustering algorithms. Trainable Speaker Diarization [Aronowitz 2007d]

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Dev data BNAD05 (5hr) - Arabic, broadcast news Eval data BNAT05 – Arabic, broadcast news, (207 target models, 6756 test segments) SystemEER (%) Anchor modeling (baseline)15.1 Anchor modeling - Kernel based scoring10.8 Kernel-PCA projection (CSS)8.8 Kernel-PCA projection (CSS) + inter-segment variability modeling 7.4 Speaker recognition on pairs of 3s segments

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Speaker change detection 2 adjacent sliding windows (3s each) Speaker verification scoring + normalization Speaker clustering Speaker verification scoring + normalization Bottom-up clustering Speaker Error Rate (SER) on BNAT05 Anchor modeling (baseline): 12.9% Kernel-PCA based method: 7.9% Speaker Diarization System & Experiments

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Introduction to GMM based classification 2 Mapping speech segments into segment space 3 Intra-class variability modeling 4 Speaker diarization 5 Summary Outline Intra-Class Variability Modeling for Speech Processing

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June A method for mapping speech segments into a GMM supervector space was described Intra-speaker inter-session variability is modeled in GMM supervector space Speaker recognition EER was reduced by 38% on the NIST-2004 SRE A corresponding kernel-PCA based approach reduces EER by 44% Speaker diarization SER for speaker diarization was reduced by 39%. Summary 1/2

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Speaker recognition [Aronowitz 2005b; Aronowitz 2007c] Speaker diarization (“who spoke when”) [Aronowitz 2007d] VAD (voice activity detection) [Aronowitz 2007a] Language identification [Noor & Aronowitz 2006] Gender identification [Bocklet 2008] Age detection [Bocklet 2008] Channel/bandwidth classification [Aronowitz 2007d] Summary 2/2 Algorithms based on the proposed framework

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June [1]D. A. Reynolds et al., “Speaker identification and verification using Guassian mixture speaker models,” Speech Communications, 17, [2] D.E. Sturim et al., “Speaker indexing in large audio databases using anchor models”, in Proc. ICASSP, [3] H. Aronowitz, D. Burshtein, A. Amir, "Speaker indexing in audio archives using test utterance Gaussian mixture modeling", in Proc. ICSLP, [4]H. Aronowitz, D. Burshtein, A. Amir, "A session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification", in Proc. ICASSP, [5]P. Kenny et al., “Factor Analysis Simplified”, in Proc. ICASSP, [6]H. Aronowitz, D. Irony, D. Burshtein, “Modeling Intra-Speaker Variability for Speaker Recognition ”, in Proc. Interspeech, [7]J. Goldberger and H. Aronowitz, "A distance measure between GMMs based on the unscented transform and its application to speaker recognition", in Proc. Interspeech [8] H. Aronowitz, D. Burshtein, "Efficient Speaker Identification and Retrieval", in Proc. Interspeech Bibliography 1/2

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June [9]A. Stolcke et al., “MLLR Transforms as Features in Speaker Recognition”, in Proc. Interspeech, [10] E. Noor, H. Aronowitz, "Efficient language Identification using Anchor Models and Support Vector Machines,“ in Proc. ISCA Odyssey Workshop, [11]W.M. Campbell et al., “SVM Based Speaker Verification Using a GMM Supervector Kernel and NAP Variability Compensation”, in Proc. ICASSP [12]H. Aronowitz, “Segmental modeling for audio segmentation”, in Proc. ICASSP, [13] J.R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models”,in Proc. ICASSP [14]H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September [15] H. Aronowitz, “Speaker Recognition using Kernel-PCA and Intersession Variability Modeling”, in Proc. Interspeech, [16] H. Aronowitz, “Trainable Speaker Diarization”, in Proc. Interspeech, [17]T. Bocklet et al., “Age and Gender Recognition for Telephone Applications Based on GMM Supervectors and Support Vector Machines”, in Proc. ICASSP, Bibliography 2/2

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Presentation is available online at: Thanks!

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Backup slides

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Session space Dot-product feature space f(x) f(y) x y Kernel trick Anchor sessions f() Goals: - Map sessions into feature space - Model in feature space Kernel-PCA Based Mapping 2/5

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Given- kernel K - n anchor sessions Find an orthonormal basis for Method 1)Compute eigenvectors of the centralized kernel-matrix k i, j = K(A i,A j ). 2)Normalize eigenvectors by square-roots of corresponding eigenvalues → {v i } 3) for is the requested basis Kernel-PCA Based Mapping 3/5

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June is a mapping x→R n with the property: Given sessions x, y, may be uniquely represented as: Common speaker subspace - Speaker unique subspace - Kernel-PCA Based Mapping 4/5

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Session space Feature space x f(x) Tx Common speaker subspace (R n) y f(y) Ty uyuy uxux Speaker unique subspace K-PCA Anchor sessions Kernel-PCA Based Mapping 5/5

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Modeling in Segment-GMM Supervector Space Segment-GMM supervector space Framesequence: segment #1 Framesequence: segment #2 Framesequence: segment #n music speech silence

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Segmental Modeling for Audio Segmentation Goal Segment audio accurately and robustly into speech / silence / music segments. Novel idea Acoustic modeling is usually done on a frame-basis. Segmentation/classification is usually done on a segment-basis (using smoothing). Why not explicitly model whole segments? Note: speaker, noise, music-context, channel (etc.) are constant during a segment.

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June FR=0.5% FA=1% FR=0.25% GMM baseline 2.9%7.9%29.6% Segmental1.7%5.1% 2.7% Error reduction 41%35% 91% Speech / Silence Segmentation – Results 1/2

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June FR=0.5% FA=1% FR=0.25% GMM baseline 1.43%3.4%3.2% Segmental1.27%2.0% 1.9% Error reduction 11%41% Speech / Silence Segmentation – Results 2/2

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June LID in Session Space English Arabic French Session space Training session Test session

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Front end: shifted delta cepstrum (SDC). 2.Represent every train/test session by a GMM super-vector. 3.Train a linear SVM to classify GMM super-vectors. Results EER=4.1% on the NIST-03 Eval (30sec sessions). LID in Session Space - Algorithm

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June Anchor Modeling Projection Speaker indexing [Sturim et al., 2001] Intersession variability modeling in projected space [Collet et al., 2005] Speaker clustering [Reynolds et al., 2004] Speaker segmentation [Collet et al., 2006] Language identification [Noor and Aronowitz, 2006] Given: anchor models λ 1,…,λ n and session X= x 1,…,x F = average normalized log-likelihood Projection:

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June The classic GMM algorithm does not explicitly model intra-speaker inter-session variability: Noise Channel Language Changing speaker characteristics – stress, emotion, aging The frame independence assumption does not hold in these cases! (1) (2) Instead, we get: Intra-Class Variability Modeling Introduction