3. Adversarial Teacher-Student Learning (AT/S)

3. Adversarial Teacher-Student Learning (AT/S)
Adversarial Teacher-Student Learning for Unsupervised Adaptation Zhong Meng1, 2, Jinyu Li 1, Yifan Gong 1, Biing-Hwang (Fred) Juang 2 1Microsoft AI and Research, USA, 2 Georgia Institute of Technology, USA 1. Introduction 3. Adversarial Teacher-Student Learning (AT/S) Problems: ASR performance degrades significantly when the domains of the training and test data mismatch Solutions: purely unsupervised adaptation Adapt a well-trained source-domain acoustic model to the data from target domain No alignment or decoding lattices available for the target domain adaptation data Teacher-Student Learning [Li et al, 2014] Parallel data from source and target domain is required Student mimics the behavior of well-trained source-domain teacher model Adversarial Learning (GRL, DSN) [Ganin et al., 2015] No parallel data from source and target domain is required Explicitly suppress the condition variability in speech signal Gradient reversal layer (GRL): multiply gradient with negative number (−𝜆) in the backward pass. Condition Classifier 𝑀 𝑑 Condition Posterior Condition Label 𝑑 𝑆 Feature Extractor 𝑀 𝑓 Deep Feature 𝑓 𝑆 Student Input Feature 𝑥 𝑆 Teacher Acoustic Model Teacher Senone Posterior Teacher Input Feature 𝑥 𝑇 Senone Loss ℒ 𝑦 Condition Loss ℒ 𝑑 Senone Classifier 𝑀 𝑦 Student Senone Posterior Student Acoustic Model GRL 𝑅 𝛼 2. Teacher-Student (T/S) Adaptation [Li et al., 2017] Student Input Feature 𝑥 𝑆 Teacher LSTM Acoustic Model 𝑀 𝑇 Teacher Senone Posterior Teacher Input Feature 𝑥 𝑇 Senone Loss Student LSTM Acoustic Model 𝑀 𝑆 Student Senone Posterior Student input 𝑋 𝑆 is parallel to teacher input 𝑋 𝑇 , i.e., frame-by-frame synchronized. Minimize the KL divergence between the output distributions of the teacher and student models 𝐾𝐿 𝑝 𝑇 || 𝑝 𝑆 =𝑝 𝑇 𝑞| 𝑥 𝑖 𝑇 ; 𝜃 𝑇 log 𝑝 𝑇 𝑞| 𝑥 𝑖 𝑇 ; 𝜃 𝑇 𝑝 𝑆 𝑞| 𝑥 𝑖 𝑆 ; 𝜃 𝑆 Teacher senone posterior in lieu of hard labels to train the student model 𝐿 𝜃 𝑆 =− 𝑖 𝑞∈𝑄 𝑝 𝑇 𝑞| 𝑥 𝑖 𝑇 ; 𝜃 𝑇 log 𝑝 𝑆 𝑞| 𝑥 𝑖 𝑆 ; 𝜃 𝑆 𝑞 is one of the senones in the senone set 𝑄 Advance T/S learning with adversarial learning to achieve condition-robust unsupervised adaptation. Goal: Learn a condition-invariant and senone-discriminative deep feature 𝑓 𝑆 . Senone classifier: 𝑀 𝑦 (𝑀 𝑓 (𝑥 𝑖 𝑆 ))= 𝑝 𝑦 𝑦 =𝑞 𝑥 𝑖 𝑆 ; 𝜃 𝑓 , 𝜃 𝑦 ,𝑞∈𝒬 Condition classifier: 𝑀 𝑑 (𝑀 𝑓 (𝑥 𝑖 𝑆 ))= 𝑝 𝑑 𝑑 =𝑎 𝑥 𝑖 𝑆 ; 𝜃 𝑓 , 𝜃 𝑑 ,𝑎∈𝒜 T/S Senone Loss: ℒ 𝑦 𝜃 𝑓 , 𝜃 𝑦 =− 𝑖 𝑞∈𝑄 𝑝 𝑇 𝑞| 𝑥 𝑖 𝑇 ; 𝜃 𝑇 𝑝 𝑦 𝑞 𝑥 𝑖 𝑆 ; 𝜃 𝑓 , 𝜃 𝑦 Condition Loss: ℒ 𝑑 𝜃 𝑓 , 𝜃 𝑑 =− 𝑖=1 𝑁 log 𝑝 𝑑 𝑖 𝑥 𝑖 𝑆 ; 𝜃 𝑓 , 𝜃 𝑑 Adversarial Multi-Task Learning (with Gradient Reversal Layer) max 𝜃 𝑑 min 𝜃 𝑦 , 𝜃 𝑓 ℒ 𝑦 𝜃 𝑦 , 𝜃 𝑓 − 𝜆ℒ 𝑑 𝜃 𝑑 , 𝜃 𝑓 4. Experiments Source-Domain Teacher Acoustic Model: LSTM trained with 375 hours Microsoft Cortana voice assistant data Adaptation Data (CHiME-3): 9137 clean and noisy parallel utterances Multi-factorial (MFA) AT/S: simultaneously suppress multiple factors (e.g., speaker and environment) that cause the condition variability. System Conditions BUS CAF PED STR Avg. WERR Un-adapted - 27.93 24.93 18.53 21.38 23.16 T/S (baseline) 15.96 14.32 11.00 13.04 13.56 Adversarial T/S 2 env. 15.24 13.95 10.71 12.76 13.15 3.02 6 env. 15.58 13.23 10.65 13.10 13.12 3.24 87 spk. 14.97 13.63 10.84 12.24 12.90 4.87 MFA T/S 6 env., 87 spk. 15.38 13.08 10.47 12.45 12.83 5.38 5. Conclusions AT/S achieves 3.24%, 4.87% and 5.38% relative WER reductions over T/S by suppressing environment, speaker and multi-factor variability. AT/S for speaker-robust unsupervised adaptation is more effective than environment-robust one. MFA T/S furthers improve the ASR performance over single-factor AT/S.

3. Adversarial Teacher-Student Learning (AT/S)

Similar presentations

Presentation on theme: "3. Adversarial Teacher-Student Learning (AT/S)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3. Adversarial Teacher-Student Learning (AT/S)

Similar presentations

Presentation on theme: "3. Adversarial Teacher-Student Learning (AT/S)"— Presentation transcript:

Similar presentations

About project

Feedback