Presentation is loading. Please wait.

Presentation is loading. Please wait.

H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.

Similar presentations


Presentation on theme: "H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast."— Presentation transcript:

1 h ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast conversation (BC). Improvements over the previous version Increased training data, Discriminative features, Frame-level discriminative training criterion, Multiple-pass AM adaptation, System combination, LM adaptation, Total 24%--64% relative CER reduction. M.Y. Hwang 1, W. Wang 2, X. Lei 1, J. Zheng 2, O. Cetin 3,, G. Peng 1 1. Department of Electrical Engineering, University of Washington, Seattle, WA, USA 2. SRI International, Menlo Park, CA, USA 3. ICSI Berkeley, Berkeley, USA Character Error Rates Increased Training Data Acoustic training data increased from 97 hours to 465 hours (2/3 BN, 1/3 BC) Test data ML word segmentation used for Chinese text. Training text increased from 420M words to 849M words. – Lexicon size increased from 49K to 60K (1700 English words) – 6 LMs trained for interpolation into one – Bigrams (qLM 2 ), trigrams (LM 3, qLM 3 ), 5-grams (LM 5a, LM 5b ) are trained. LM 5b uses count-based smoothing. Future Work Topic-dependent language model adaptation. Machine-translation (MT) targeted error rates. Perplexity Two Acoustic Models 1.MFCC, 39-dim, CW+SAT fMPE+MPE, 3000x128 Gaussians. 2. MFCC + MPE-phoneme posterior feature, 74-dim, nonCW fMPE+MPE, 3000x64 Gaussians. MLP features: [1] Zheng, ICASSP-2007, “Combining discriminative feature, transform, and model training for large vocabulary speech recognition”. [2] Chen, ICSLP-2004, “Learning long-term temporal features in LVCSR using neural networks”. Test SetBNBC Dev040.5 hr- Eval041 hr- Ext061 hr- Dev05bc-2.7 hr Training textBNBC (1) TDT+ 17.7M (2) GALE 3.0M2.7M (3) Giga-CNA451.4M (4) Giga-XIN260.9M (5) Giga-ZBN 15.8M (6) NTU-Web 95.5M2.1M Final LM 844.3M4.8M LMWord Perplexity 49K LM 4 243.8 60K qLM 2 359.5 60K qLM 3 228.7 60K LM 3 193.0 60K LM 5a 77.9 AMLMLexAMDev04Eval04 97 hr420M wrd49KMFCC CW+SAT MPE6.0%16.0% 465 hr420M wrd49KMFCC CW+SAT MPE5.3%15.1% 465 hr850M wrd60KMFCC CW+SAT fMPE+MPE || MLP fMPE+MPE 3.7%12.2% AMLMLexAMExt60Dev05bc 97 hr420M wrd49KMFCC CW+SAT MPE15.0%34.0% 465 hr850M wrd60KMFCC CW+SAT fMPE+MPE || MLP fMPE+MPE 5.4%22.5% LM Adaptation for BC (Dev05bc) LM BN = (1) ~ (6) BN + EARS Conversational Telephony Speech (159M words) LM BC = (2)+(6) BC LM ALL = interpolation (LM BN, LM BC ) LM BN-C = LM BN adapted by (2) GALE-BC One LM adaptation per show i, to maximize the likelihood of h Re-start the entire recognition process after LM adaptation. LM BN ’ = LM BN adapted by h dynamically per show i Same strategy no improvement on BN test data --- plenty of BN training text. Adaptation SetupFirst-passFinal CER LM ALL (no LM adaptation)24.9%21.9% i LM BC + (1- i ) LM BN 24.4%21.2% i LM BC + (1- i ) LM BN ’ 24.3%21.0% i LM BC + (1- i ) LM BN-C 24.0%20.6% Acoustic segmentation Speaker clustering VTLN/CMN/CVN 1.nonCW MLP qLM 3 3. nonCW MLP MLLR, LM 3 2. CW MFCC MLLR, LM 3 LM 5a, LM 5b rescore Confusion Network Combination Top 1 Decoding Architecture LM 5a, LM 5b rescore


Download ppt "H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast."

Similar presentations


Ads by Google