Download presentation
Presentation is loading. Please wait.
1
1 Less is More? Yi Wu Advisor: Alex Rudnicky
2
2 People: There is no data like more data!
3
3 Goal: Use less to Perform more Identifying an informative subset from a large corpus for Acoustic Model (AM) training. Expectation of the Selected Set Good in Performance Fast in Selection
4
4 Motivation The improvement of system will become increasingly smaller when we keep adding data. Training acoustic model is time consuming. We need some guidance on what is the most needed data.
5
5 Approach Overview Applied to well-transcribed data Selection based on transcription Choose subset that have “uniform” distribution on speech unit (word, phoneme, character)
6
6 How to sample data wisely? --A simple example k Gaussian distribution with known priorω i and unknown density function f i (μ i,σ i )
7
7 How to sample wisely? --A simplified example We are given access to at most N examples. We have right to choose how much we want from each class. We train the model use MLE estimator. When a new sample generated, we use our model to determine its class. Question: How to sample to achieve minimum error?
8
8 The optimal Bayes Classifier If we have the exact form of f i (x), above classification is optimal.
9
9 To approximate the optimal We use our MLE The true error would be bounded by optimal Bayes error plus error bound for our worst estimated
10
10 Sample Uniformly We want to sample each class equally. The data selected will have good coverage on each class. This will give robust estimation on each class.
11
11 The Real ASR system
12
12 Data Selection for ASR System The prior has been estimated independently by language model. To make acoustic model accurate, we want to sample the W uniformly. We can take the unit to be phoneme, character, word. We want their distribution to be uniform.
13
13 Entropy: Measure for “uniformness” Use the entropy of the word (phoneme) as ways of evaluation Suppose the word (phoneme) has a sample distribution p 1, p 2 …. p n Choose subset have maximum -p 1 *log(p 1 )- p 2 *log(p 2 )-... p n *log(p n )) Entropy actually is the KL distance from uniform distribution
14
14 Computational Issue It is computational intractable to find the transcription set that maximizes the entropy Forward Greedy Search
15
15 Combination There are multiple entropies we want to maximize. Combination Method Weighted Sum Add sequentially
16
16 Experiment Setup System: Sphinx III Feature: 39 dimension MFCC Training Corpus: Chinese BN 97(30hr)+ GaleY1(810hr data) Test Set: RT04(60 min)
17
17 Experiment 1 ( use word distribution) Time (hour)3050100840 Random (all) 27.627.126.124.3 Max-entropy 27.026.224.8 Table 1
18
18 More Result 30 h50 h100 h150 h840 h random(all)27.627.126.125.024.3 cctv(bn)1715.713.213.612.9 ntdv(bn)24.724.223.322.221.0 rfa(bc)42.943.64441.141.041.0 bc/bn(ratio)15.4/14.625.7/24.351.2/49.876.8/73.2 431/40 9 max- entropy(all) 2726.224.8 cctv(bn)151413 ntdtv(bn)2322.321.1 rfa(bc)45.844.842.7 bc/bn(ratio)11.0/19.018.2/31.850.650.6/49.8
19
19 Experiment 2 (add sequentially with phoneme and character 150hr) CCTVNTDTVRFAALL Random(150h) 13.622.244.125.0 Max-entropy (word+char) 12.221.842.324.7 Max-entropy (word+phone) 13.120.541.824.4 All data (840 hrs) 12.921.041.024.3 Table 2
20
20 Experiment 1,2
21
21 Experiment 3 (with VTLN) CCTVNTDTVRFAALL 150 hr (word+phone) 13.120.541.824.4 With VTLN 11.817.840.122.5 Table 3
22
22 Summary Choose data uniformly according to speech unit Maximize entropy using greedy algorithm Add data sequentially Future Work Combine Multiple Sources Select Un-transcribed Data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.