Download presentation
1
Algoritmi e Programmazione Avanzata
Adapting Hybrid ANN/HMM to Speech Variations Stefano Scanzio, Pietro Laface Politecnico di Torino Dario Albesano, Roberto Gemello, and Franco Mana Introduzione alla ricorsione
2
Acoustic Model Adaptation
Algoritmi e Programmazione Avanzata Acoustic Model Adaptation Adaptation tasks Linear Input Network Linear Hidden Network Catastrophic forgetting Conservative Training Results on several adaptation tasks ICASSP 2006 Introduzione alla ricorsione
3
Acoustic model adaptation
Algoritmi e Programmazione Avanzata Acoustic model adaptation Specific speaker Speaking style (spontaneous, regional accents) Audio channel (telephone, cellular, microphone) Environment (car, office, …) Specific vocabulary Voice Application ASR Data Log Task independent models Adapted Models ANN Adaptation ICASSP 2006 Introduzione alla ricorsione
4
Linear Input Network adaptation
Algoritmi e Programmazione Avanzata Linear Input Network adaptation Acoustic phonetic units Emission Probabilities …. Speaker/Task Independent MLP …. …. LIN …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione
5
Linear Hidden Network - LHN
Algoritmi e Programmazione Avanzata Linear Hidden Network - LHN Acoustic phonetic units Emission Probabilities …. LHN Hidden layer 2 …. Hidden layer 1 …. …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione
6
Catastrophic forgetting
Algoritmi e Programmazione Avanzata Catastrophic forgetting Acquiring new information can damage previously learned information if the new data that do not adequately represent the knowledge included in the original training data This effect is evident when adaptation data do not contain examples for a subset of the output classes. Problem is more severe in ANN framework than in the Gaussian Mixture HMM framework ICASSP 2006 Introduzione alla ricorsione
7
Catastrophic forgetting
Algoritmi e Programmazione Avanzata Catastrophic forgetting Back-propagation algorithm penalizes classes with no adaptation examples setting their target value to zero for every adaptation frame Thus, during adaptation, the weights of the ANN will be biased to favor the activations of the classes with samples in the adaptation set to weaken the other classes. ICASSP 2006 Introduzione alla ricorsione
8
Algoritmi e Programmazione Avanzata
16-classes training 20 x 2 hidden nodes 2 input nodes 16 output nodes 2500 patterns per class 6 7 The adaptation set includes 5000 patterns belonging only to classes 6 and 7 Error Rate : 1.5% Error Rate : 6.7% Total Error Rate: 4.1% 7 6 ICASSP 2006 Introduzione alla ricorsione
9
Algoritmi e Programmazione Avanzata
Adaptation of 2 Classes 6 7 Error Rate : 0% Error Rate : 2.0% Total Error Rate: 16.9% 7 6 ICASSP 2006 Introduzione alla ricorsione
10
Conservative Training target assignment policy
Algoritmi e Programmazione Avanzata Conservative Training target assignment policy Standard target assignment policy Posterior probability computed using the original network M1 P1 P2 P3 M2 P2 is the class corresponding to the current input frame Px: class in the adaptation set Mx: missing class ICASSP 2006 Introduzione alla ricorsione
11
Algoritmi e Programmazione Avanzata
Adaptation of 2 classes Standard adaptation Conservative Training adaptation 7 6 Error Rate : 0% Error Rate : 2% Total Error Rate: 16.9% Error Rate : 2.2% Error Rate : 5.2% Total Error Rate: 10.2% 6 7 6 7 ICASSP 2006 Introduzione alla ricorsione
12
Algoritmi e Programmazione Avanzata
Adaptation tasks Application data adaptation: Directory Assistance 9325 Italian city names 53713 training test utterances Vocabulary adaptation: Command words 30 command words 6189 training test utterances Channel-Environment adaptation: Aurora-3 2951 training test utterances Speaker adaptation: WSJ0 8 speakers, 16KHz 40 test + 40 train sentences ICASSP 2006 Introduzione alla ricorsione
13
Results on different tasks (%WER)
Algoritmi e Programmazione Avanzata Results on different tasks (%WER) Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 No adaptation 14.6 3.8 24.0 LIN 11.2 3.4 11.0 LIN + CT 12.4 15.3 LHN 9.6 2.1 9.8 LHN + CT 10.1 2.3 10.4 ICASSP 2006 Introduzione alla ricorsione
14
Mitigation of Catastrophic Forgetting using Conservative Training
Algoritmi e Programmazione Avanzata Mitigation of Catastrophic Forgetting using Conservative Training Tests using adapted models on Italian continuous speech (% WER) Models Adapted on Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 Adaptation Method LIN 36.3 42.7 108.6 LIN + CT 36.5 35.2 42.1 LHN 40.6 63.7 152.1 LHN + CT 40.7 45.3 44.2 No Adaptation 29.3 ICASSP 2006 Introduzione alla ricorsione
15
Networks used in Speaker Adaptation Task
Algoritmi e Programmazione Avanzata Networks used in Speaker Adaptation Task STD (Standard) 2 hidden layer hybrid MLP-HMM model 273 input features (39 parameters and 7 context frames) IMP (Improved) Uses a wider input window spanning a time context of 25 frames Includes an additional hidden layer ICASSP 2006 Introduzione alla ricorsione
16
Results on WSJ0 Speaker Adaptation Task
Algoritmi e Programmazione Avanzata Results on WSJ0 Speaker Adaptation Task Net type Adaptation method Trigram LM STD No adaptation 8.4 LIN 7.9 LIN+CT 7.1 LHN+CT 6.6 LIN+LHN+CT 6.3 IMP 6.5 5.6 5.0 ICASSP 2006 Introduzione alla ricorsione
17
Algoritmi e Programmazione Avanzata
Conclusions LHN adaptation outperforms LIN adaptation Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN In adaptation tasks with missing classes, Conservative Training reduces the catastrophic forgetting effect, preserving the performance on another generic task improve the performance in speaker adaptation with few available sentences ICASSP 2006 Introduzione alla ricorsione
18
Algoritmi e Programmazione Avanzata
Weight merging O Hidden 2 LIN I Hidden 1 ICASSP 2006 Introduzione alla ricorsione
19
Conservative Training (CT)
Algoritmi e Programmazione Avanzata Conservative Training (CT) For each observation frame: 1) Set the target value for each class that has no (few) adaptation data to its posterior probability computed by the original network 2) Set to zero the target value for a class that has adaptation data, but does not correspond to the input frame 3) Set the target value for the class corresponding to the input frame to 1 minus the sum of the posterior probabilities assigned according to rule 1) ICASSP 2006 Introduzione alla ricorsione
20
Conclusions on LHN LHN outperform LIN
Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN For continuous speech, the wide-input IMP network is better than the STD one ICASSP 2006
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.