Algoritmi e Programmazione Avanzata Adapting Hybrid ANN/HMM to Speech Variations Stefano Scanzio, Pietro Laface Politecnico di Torino Dario Albesano, Roberto Gemello, and Franco Mana Introduzione alla ricorsione
Acoustic Model Adaptation Algoritmi e Programmazione Avanzata Acoustic Model Adaptation Adaptation tasks Linear Input Network Linear Hidden Network Catastrophic forgetting Conservative Training Results on several adaptation tasks ICASSP 2006 Introduzione alla ricorsione
Acoustic model adaptation Algoritmi e Programmazione Avanzata Acoustic model adaptation Specific speaker Speaking style (spontaneous, regional accents) Audio channel (telephone, cellular, microphone) Environment (car, office, …) Specific vocabulary Voice Application ASR Data Log Task independent models Adapted Models ANN Adaptation ICASSP 2006 Introduzione alla ricorsione
Linear Input Network adaptation Algoritmi e Programmazione Avanzata Linear Input Network adaptation Acoustic phonetic units Emission Probabilities …. Speaker/Task Independent MLP …. …. LIN …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione
Linear Hidden Network - LHN Algoritmi e Programmazione Avanzata Linear Hidden Network - LHN Acoustic phonetic units Emission Probabilities …. LHN Hidden layer 2 …. Hidden layer 1 …. …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione
Catastrophic forgetting Algoritmi e Programmazione Avanzata Catastrophic forgetting Acquiring new information can damage previously learned information if the new data that do not adequately represent the knowledge included in the original training data This effect is evident when adaptation data do not contain examples for a subset of the output classes. Problem is more severe in ANN framework than in the Gaussian Mixture HMM framework ICASSP 2006 Introduzione alla ricorsione
Catastrophic forgetting Algoritmi e Programmazione Avanzata Catastrophic forgetting Back-propagation algorithm penalizes classes with no adaptation examples setting their target value to zero for every adaptation frame Thus, during adaptation, the weights of the ANN will be biased to favor the activations of the classes with samples in the adaptation set to weaken the other classes. ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata 16-classes training 20 x 2 hidden nodes 2 input nodes 16 output nodes 2500 patterns per class 6 7 The adaptation set includes 5000 patterns belonging only to classes 6 and 7 Error Rate : 1.5% Error Rate : 6.7% Total Error Rate: 4.1% 7 6 ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata Adaptation of 2 Classes 6 7 Error Rate : 0% Error Rate : 2.0% Total Error Rate: 16.9% 7 6 ICASSP 2006 Introduzione alla ricorsione
Conservative Training target assignment policy Algoritmi e Programmazione Avanzata Conservative Training target assignment policy Standard target assignment policy Posterior probability computed using the original network 0.03 0.00 0.95 0.00 0.02 0.00 0.00 1.00 0.00 0.00 M1 P1 P2 P3 M2 P2 is the class corresponding to the current input frame Px: class in the adaptation set Mx: missing class ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata Adaptation of 2 classes Standard adaptation Conservative Training adaptation 7 6 Error Rate : 0% Error Rate : 2% Total Error Rate: 16.9% Error Rate : 2.2% Error Rate : 5.2% Total Error Rate: 10.2% 6 7 6 7 ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata Adaptation tasks Application data adaptation: Directory Assistance 9325 Italian city names 53713 training + 3917 test utterances Vocabulary adaptation: Command words 30 command words 6189 training + 3094 test utterances Channel-Environment adaptation: Aurora-3 2951 training + 654 test utterances Speaker adaptation: WSJ0 8 speakers, 16KHz 40 test + 40 train sentences ICASSP 2006 Introduzione alla ricorsione
Results on different tasks (%WER) Algoritmi e Programmazione Avanzata Results on different tasks (%WER) Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 No adaptation 14.6 3.8 24.0 LIN 11.2 3.4 11.0 LIN + CT 12.4 15.3 LHN 9.6 2.1 9.8 LHN + CT 10.1 2.3 10.4 ICASSP 2006 Introduzione alla ricorsione
Mitigation of Catastrophic Forgetting using Conservative Training Algoritmi e Programmazione Avanzata Mitigation of Catastrophic Forgetting using Conservative Training Tests using adapted models on Italian continuous speech (% WER) Models Adapted on Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 Adaptation Method LIN 36.3 42.7 108.6 LIN + CT 36.5 35.2 42.1 LHN 40.6 63.7 152.1 LHN + CT 40.7 45.3 44.2 No Adaptation 29.3 ICASSP 2006 Introduzione alla ricorsione
Networks used in Speaker Adaptation Task Algoritmi e Programmazione Avanzata Networks used in Speaker Adaptation Task STD (Standard) 2 hidden layer hybrid MLP-HMM model 273 input features (39 parameters and 7 context frames) IMP (Improved) Uses a wider input window spanning a time context of 25 frames Includes an additional hidden layer ICASSP 2006 Introduzione alla ricorsione
Results on WSJ0 Speaker Adaptation Task Algoritmi e Programmazione Avanzata Results on WSJ0 Speaker Adaptation Task Net type Adaptation method Trigram LM STD No adaptation 8.4 LIN 7.9 LIN+CT 7.1 LHN+CT 6.6 LIN+LHN+CT 6.3 IMP 6.5 5.6 5.0 ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata Conclusions LHN adaptation outperforms LIN adaptation Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN In adaptation tasks with missing classes, Conservative Training reduces the catastrophic forgetting effect, preserving the performance on another generic task improve the performance in speaker adaptation with few available sentences ICASSP 2006 Introduzione alla ricorsione
Algoritmi e Programmazione Avanzata Weight merging O Hidden 2 LIN I Hidden 1 ICASSP 2006 Introduzione alla ricorsione
Conservative Training (CT) Algoritmi e Programmazione Avanzata Conservative Training (CT) For each observation frame: 1) Set the target value for each class that has no (few) adaptation data to its posterior probability computed by the original network 2) Set to zero the target value for a class that has adaptation data, but does not correspond to the input frame 3) Set the target value for the class corresponding to the input frame to 1 minus the sum of the posterior probabilities assigned according to rule 1) ICASSP 2006 Introduzione alla ricorsione
Conclusions on LHN LHN outperform LIN Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN For continuous speech, the wide-input IMP network is better than the STD one ICASSP 2006