Presentation is loading. Please wait.

Presentation is loading. Please wait.

汉语连续语音识别 ---- 1999年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 ---- 1999年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室 100084 fzheng@sp.cs.tsinghua.edu.cn.

Similar presentations


Presentation on theme: "汉语连续语音识别 ---- 1999年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 ---- 1999年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室 100084 fzheng@sp.cs.tsinghua.edu.cn."— Presentation transcript:

1 汉语连续语音识别 ---- 1999年1月4日访北京工业大学
973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室

2 Outline Framework for Continuous Speech Recognition 连续语音识别的基本框架
Existing Applications of Speech Lab 语音实验室的现有应用 The Future Cooperation 将来的合作意向 Speech Lab., CST, THU

3 Framework for CSR Two Layers Acoustic Modeling and Recognition
Language Modeling and Processing The Acoustic Processing AIM: turn the signals into Chinese syllables Speech Lab., CST, THU

4 Framework for CSR (cont’d)
The Acoustic Processing (cont’d) Feature Extraction Format, E, ZCT, pitch, ... LPC-derived Cepstrum : consonant Mel-frequency Cepstrum : selection of filters Auditory : Not so efficient How to combine different features ? Speech Lab., CST, THU

5 Framework for CSR (cont’d)
The Acoustic Processing (cont’d) Pattern Recognition Statistics : HMM and Derivation State Transition : time-invariant, dwell, last-state-trap Description of feature space (MGD, NN) Inaccurate Independent Assumptions (for State and Observation) ANN Design (node and layer numbers, structure, …) ? Training problem when data insufficient ? How about combining HMM & ANN ? Speech Lab., CST, THU

6 Framework for CSR (cont’d)
The Acoustic Processing (cont’d) Time Alignment Is Dynamic Programming Better for SR ? Always find a best match, no matter which to match e.g., /ia/ matched with /a/ How about Knowledge-based Searching ? Syllable-Detection, accompanying with Search ? Definite Speech Segment (Syllable String) by Acoustic Information (E, Z, pitch, …) Knowledge navigated search (Dwell, Delta Dwell, separation point, lexicon/language knowledge, …) Speech Lab., CST, THU

7 Framework for CSR (cont’d)
The Language Processing Statistics: N-Gram based Basic theory W * =argmaxw P(W|A) = argmaxw P(A|W) P(W) / P(A) P(W) = P(w1…wK) = P(w1) P(w2| w1)…P(wK| w1 …wK-1) P(wk| w1 …wk-1) = P(wk| wk-N …wk-1) Problems Sparseness Training materials (collecting sufficient materials) N-Gram probability matrix (smoothing technology) Speech Lab., CST, THU

8 Framework for CSR (cont’d)
The Language Processing (cont’d) Statistics: N-Gram based (cont’d) Problems (cont’d) Equal (bigger) occurrence means equal (bigger) probability ? E.g.: “我 吃 红烧肉” vs. “我 吃 红小豆” Equivalent word class (word clustering) ? How ? Estimation of probability where there is new word E.g.: “我 吃 萝卜” vs. “我 吃 [火锅]” Computation and storage: O(W N) word clustering: by data ? by rule ? Speech Lab., CST, THU

9 Framework for CSR (cont’d)
The Language Processing (cont’d) Linguistic knowledge: rule-based Is it mature ? I mean for use in speech recognition ? How to use ? Lexicon ? Syntax/grammar ? Semantics ? Sentential form ? ... Speech Lab., CST, THU

10 Framework for CSR (cont’d)
The Language Processing (cont’d) Our Point of View Giving priority to the N-Gram based LM Using rule-base LM to smooth N-Gram probabilities Impossible 0 probability vs. sparseness caused 0 probability Word pairs in same grammar/semantics position LM Search on the basis of N-Gram probabilities Error locating and correcting on the basis of rules Estimating N-gram probabilities when new words added according to semantics knowledge Speech Lab., CST, THU

11 Framework for CSR (cont’d)
The Important Robustness Issues Speaker Gender, accent, style (speed, loudness), stress, … Environment Background noises, microphone, channel, … Domain How to modify the N-Gram probabilities ? Speech Lab., CST, THU

12 Existing Applications
Application directions Language Recognition Speaker Recognition S2E - Speaking Skill Evaluation Keyword Spotting Voice Command (Isolated SR) Dictating (Continuous SR) Speech Lab., CST, THU

13 Existing Applications (cont’d)
Existing applications/products BigMouth English (大嘴英语) for Golden Disc Co., Beijing Speaking English As You Wish (随心所欲) for Human Co., Beijing Voice Command and Voice Phonebook for InstDict (快译通) product of Group Sense Ltd. (GSL), Hong Kong Voice Dialler (拨号器) for SoundTek Co., Guangdong EasyCmd: Voice Command Navigator EasyTalk: Chinese Dictation Machine Speech Lab., CST, THU

14 The Future Cooperation
已有技术 汉语分词平台(含词表) 未/已切分的语料库 不能成对的落单字对配搭信息 利用先验知识给出的规则统计结果,如中外姓名以及地名用字的统计规律和分析方法 Speech Lab., CST, THU

15 The Future Cooperation (cont’d)
可进一步合作的子项目 词的等价类分析,如基于词性 利用语言知识平滑统计概率 错误定位算法(利用现有校对系统) 新加词身份确认 Speech Lab., CST, THU

16 Thanks for Your Patience
Beijing Polytechnic University, Jan. 1999 Speech Lab., CST, THU


Download ppt "汉语连续语音识别 ---- 1999年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 ---- 1999年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室 100084 fzheng@sp.cs.tsinghua.edu.cn."

Similar presentations


Ads by Google