Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,

Similar presentations


Presentation on theme: "Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,"— Presentation transcript:

1 Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab, Univ. of Washington 2 Hong Kong Univ. of Science and Tech. 3 Chinese Univ. of Hong Kong 9/19/2006 Interspeech’06

2 Why Model Tones? In Mandarin, recognition of tones is needed for decoding words Ng et al. (SPL 2005) showed that word context alone (modeled by an n-gram) cannot disambiguate tones In our work: upperbound evaluation –Pruning lattice arcs with the wrong tone improves CER 12.0%  8.2% Interspeech’06

3 Background & Approach Previous work: –Embedded tone modeling: augment observation vector with F0 features extend phone set to include tonal units –Explicit tone modeling: separate detection of tones and phones Our approach –Improve pitch features for embedded modeling –Combine embedded approach with explicit tone models in lattice rescoring Interspeech’06

4 Baseline Mandarin BN System Train and test data –Acoustic: 28hrs of Hub4 –Text: 121M words from Hub4, TDT[2,3,4], Gigaword(Xinhua) –Test: 1hr of eval04 (CTV, RFA and NTDTV) Feature and models –39-dim MFCC + 3-dim pitch/delta/double delta –Maximum likelihood AM 2000x32, bigram LM Decoding structure –Auto segmentation, speaker clustering, –First pass decoding  3-class MLLR in second pass decoding Evaluation in terms of character error rate (CER) Interspeech’06

5 Smoothing and Normalization of F0 Interspeech’06 Composition of sentence F0 contour: phrase intonation + lexical tone + segmental effects + tracking errors F0 processing algorithm 1.Spline interpolate F0 contour with piecewise cubic Hermite interpolating polynomial (PCHIP) 2.Take the log of F0 3.Moving window normalization (MWN) for phrase level effects 4.5-point moving average (MA) smoother 5.Mean/var normalization as for MFCC features Raw F0 Final feature

6 Embedded Modeling Experiments Compare CER using different F0 processing techniques to ASR without F0 Baseline IBM-style smoothing interpolates unvoiced regions with waveform F0 average Observations: –Biggest win from using some F0 (vs. none) –Next biggest gain is from normalization Interspeech’06 Feature CTVRFANTDTVOverall MFCC only 14.038.521.524.1 + IBM-style F0 13.035.419.822.2 + spline F0 12.935.019.722.0 + spline + MWN+ MA F0 12.035.218.821.4

7 Explicit Tone Modeling Approach 4-way tone T i classification –Neural net using features f i : Sampled F0 Duration Polynomial regression coefficients (not very helpful) –Train on and score only longer syllables Lattice rescoring combines –Acoustic score from embedded model –Language model score –Duration-weighted tone posterior: d i log p(T i |f i ) where d i is the syllable duration Interspeech’06

8 Explicit Tone Model Experiments 4-way tone classification results on CTV Impact after lattice rescoring for CTV: –12.0% CER  11.5% CER –Significant with p<.04 Interspeech’06 UnitsFeaturesDim # of NN nodes % Acc. CI toneFinal F0 + dur43570.6 CI toneSyllable F0 + dur74074.4 Left tone contextSyllable F0 + dur1410076.2

9 Summary and Future Work Contributions –Upperbound evaluation of explicit tone modeling in lattice rescoring –Improved F0 features for embedded tone modeling via new smoothing and normalization methods –Combining explicit tone models in lattice rescoring gives significant CER reduction Future work –Modeling the coarticulation effects of tones –Tone model adaptation Interspeech’06


Download ppt "Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,"

Similar presentations


Ads by Google