Presentation is loading. Please wait.

Presentation is loading. Please wait.

3. Adversarial Teacher-Student Learning (AT/S)

Similar presentations


Presentation on theme: "3. Adversarial Teacher-Student Learning (AT/S)"β€” Presentation transcript:

1 3. Adversarial Teacher-Student Learning (AT/S)
Adversarial Teacher-Student Learning for Unsupervised Adaptation Zhong Meng1, 2, Jinyu Li 1, Yifan Gong 1, Biing-Hwang (Fred) Juang 2 1Microsoft AI and Research, USA, 2 Georgia Institute of Technology, USA 1. Introduction 3. Adversarial Teacher-Student Learning (AT/S) Problems: ASR performance degrades significantly when the domains of the training and test data mismatch Solutions: purely unsupervised adaptation Adapt a well-trained source-domain acoustic model to the data from target domain No alignment or decoding lattices available for the target domain adaptation data Teacher-Student Learning [Li et al, 2014] Parallel data from source and target domain is required Student mimics the behavior of well-trained source-domain teacher model Adversarial Learning (GRL, DSN) [Ganin et al., 2015] No parallel data from source and target domain is required Explicitly suppress the condition variability in speech signal Gradient reversal layer (GRL): multiply gradient with negative number (βˆ’πœ†) in the backward pass. Condition Classifier 𝑀 𝑑 Condition Posterior Condition Label 𝑑 𝑆 Feature Extractor 𝑀 𝑓 Deep Feature 𝑓 𝑆 Student Input Feature π‘₯ 𝑆 Teacher Acoustic Model Teacher Senone Posterior Teacher Input Feature π‘₯ 𝑇 Senone Loss β„’ 𝑦 Condition Loss β„’ 𝑑 Senone Classifier 𝑀 𝑦 Student Senone Posterior Student Acoustic Model GRL 𝑅 𝛼 2. Teacher-Student (T/S) Adaptation [Li et al., 2017] Student Input Feature π‘₯ 𝑆 Teacher LSTM Acoustic Model 𝑀 𝑇 Teacher Senone Posterior Teacher Input Feature π‘₯ 𝑇 Senone Loss Student LSTM Acoustic Model 𝑀 𝑆 Student Senone Posterior Student input 𝑋 𝑆 is parallel to teacher input 𝑋 𝑇 , i.e., frame-by-frame synchronized. Minimize the KL divergence between the output distributions of the teacher and student models 𝐾𝐿 𝑝 𝑇 || 𝑝 𝑆 =𝑝 𝑇 π‘ž| π‘₯ 𝑖 𝑇 ; πœƒ 𝑇 log 𝑝 𝑇 π‘ž| π‘₯ 𝑖 𝑇 ; πœƒ 𝑇 𝑝 𝑆 π‘ž| π‘₯ 𝑖 𝑆 ; πœƒ 𝑆 Teacher senone posterior in lieu of hard labels to train the student model 𝐿 πœƒ 𝑆 =βˆ’ 𝑖 π‘žβˆˆπ‘„ 𝑝 𝑇 π‘ž| π‘₯ 𝑖 𝑇 ; πœƒ 𝑇 log 𝑝 𝑆 π‘ž| π‘₯ 𝑖 𝑆 ; πœƒ 𝑆 π‘ž is one of the senones in the senone set 𝑄 Advance T/S learning with adversarial learning to achieve condition-robust unsupervised adaptation. Goal: Learn a condition-invariant and senone-discriminative deep feature 𝑓 𝑆 . Senone classifier: 𝑀 𝑦 (𝑀 𝑓 (π‘₯ 𝑖 𝑆 ))= 𝑝 𝑦 𝑦 =π‘ž π‘₯ 𝑖 𝑆 ; πœƒ 𝑓 , πœƒ 𝑦 ,π‘žβˆˆπ’¬ Condition classifier: 𝑀 𝑑 (𝑀 𝑓 (π‘₯ 𝑖 𝑆 ))= 𝑝 𝑑 𝑑 =π‘Ž π‘₯ 𝑖 𝑆 ; πœƒ 𝑓 , πœƒ 𝑑 ,π‘Žβˆˆπ’œ T/S Senone Loss: β„’ 𝑦 πœƒ 𝑓 , πœƒ 𝑦 =βˆ’ 𝑖 π‘žβˆˆπ‘„ 𝑝 𝑇 π‘ž| π‘₯ 𝑖 𝑇 ; πœƒ 𝑇 𝑝 𝑦 π‘ž π‘₯ 𝑖 𝑆 ; πœƒ 𝑓 , πœƒ 𝑦 Condition Loss: β„’ 𝑑 πœƒ 𝑓 , πœƒ 𝑑 =βˆ’ 𝑖=1 𝑁 log 𝑝 𝑑 𝑖 π‘₯ 𝑖 𝑆 ; πœƒ 𝑓 , πœƒ 𝑑 Adversarial Multi-Task Learning (with Gradient Reversal Layer) max πœƒ 𝑑 min πœƒ 𝑦 , πœƒ 𝑓 β„’ 𝑦 πœƒ 𝑦 , πœƒ 𝑓 βˆ’ πœ†β„’ 𝑑 πœƒ 𝑑 , πœƒ 𝑓 4. Experiments Source-Domain Teacher Acoustic Model: LSTM trained with 375 hours Microsoft Cortana voice assistant data Adaptation Data (CHiME-3): 9137 clean and noisy parallel utterances Multi-factorial (MFA) AT/S: simultaneously suppress multiple factors (e.g., speaker and environment) that cause the condition variability. System Conditions BUS CAF PED STR Avg. WERR Un-adapted - 27.93 24.93 18.53 21.38 23.16 T/S (baseline) 15.96 14.32 11.00 13.04 13.56 Adversarial T/S 2 env. 15.24 13.95 10.71 12.76 13.15 3.02 6 env. 15.58 13.23 10.65 13.10 13.12 3.24 87 spk. 14.97 13.63 10.84 12.24 12.90 4.87 MFA T/S 6 env., 87 spk. 15.38 13.08 10.47 12.45 12.83 5.38 5. Conclusions AT/S achieves 3.24%, 4.87% and 5.38% relative WER reductions over T/S by suppressing environment, speaker and multi-factor variability. AT/S for speaker-robust unsupervised adaptation is more effective than environment-robust one. MFA T/S furthers improve the ASR performance over single-factor AT/S.


Download ppt "3. Adversarial Teacher-Student Learning (AT/S)"

Similar presentations


Ads by Google