汉语连续语音识别 ---- 1999年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 ---- 1999年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室 100084 fzheng@sp.cs.tsinghua.edu.cn.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Chapter 1: Information and Computation. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Review key ideas from last few.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Speech Recognition in Noise
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Natural Language Understanding
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Introduction to Automatic Speech Recognition
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
This week: overview on pattern recognition (related to machine learning)
Data Processing Functions CSC508 Techniques in Signal/Data Processing.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
1 Computational Linguistics Ling 200 Spring 2006.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Speech recognition and the EM algorithm
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Algoritmi e Programmazione Avanzata
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Speech Recognition
Linguistic knowledge for Speech recognition
ARTIFICIAL NEURAL NETWORKS
Automatic Speech Recognition Introduction
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
Kocaeli University Introduction to Engineering Applications
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
CS4705 Natural Language Processing
Research on the Modeling of Chinese Continuous Speech Recognition
A maximum likelihood estimation and training on the fly approach
Speech Recognition: Acoustic Waves
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presenter: Shih-Hsiang(士翔)
Keyword Spotting Dynamic Time Warping
Music Signal Processing
Presentation transcript:

汉语连续语音识别 ---- 1999年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 ---- 1999年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室 100084 fzheng@sp.cs.tsinghua.edu.cn

Outline Framework for Continuous Speech Recognition 连续语音识别的基本框架 Existing Applications of Speech Lab 语音实验室的现有应用 The Future Cooperation 将来的合作意向 Speech Lab., CST, THU

Framework for CSR Two Layers Acoustic Modeling and Recognition Language Modeling and Processing The Acoustic Processing AIM: turn the signals into Chinese syllables Speech Lab., CST, THU

Framework for CSR (cont’d) The Acoustic Processing (cont’d) Feature Extraction Format, E, ZCT, pitch, ... LPC-derived Cepstrum : consonant Mel-frequency Cepstrum : selection of filters Auditory : Not so efficient How to combine different features ? Speech Lab., CST, THU

Framework for CSR (cont’d) The Acoustic Processing (cont’d) Pattern Recognition Statistics : HMM and Derivation State Transition : time-invariant, dwell, last-state-trap Description of feature space (MGD, NN) Inaccurate Independent Assumptions (for State and Observation) ANN Design (node and layer numbers, structure, …) ? Training problem when data insufficient ? How about combining HMM & ANN ? Speech Lab., CST, THU

Framework for CSR (cont’d) The Acoustic Processing (cont’d) Time Alignment Is Dynamic Programming Better for SR ? Always find a best match, no matter which to match e.g., /ia/ matched with /a/ How about Knowledge-based Searching ? Syllable-Detection, accompanying with Search ? Definite Speech Segment (Syllable String) by Acoustic Information (E, Z, pitch, …) Knowledge navigated search (Dwell, Delta Dwell, separation point, lexicon/language knowledge, …) Speech Lab., CST, THU

Framework for CSR (cont’d) The Language Processing Statistics: N-Gram based Basic theory W * =argmaxw P(W|A) = argmaxw P(A|W) P(W) / P(A) P(W) = P(w1…wK) = P(w1) P(w2| w1)…P(wK| w1 …wK-1) P(wk| w1 …wk-1) = P(wk| wk-N …wk-1) Problems Sparseness Training materials (collecting sufficient materials) N-Gram probability matrix (smoothing technology) Speech Lab., CST, THU

Framework for CSR (cont’d) The Language Processing (cont’d) Statistics: N-Gram based (cont’d) Problems (cont’d) Equal (bigger) occurrence means equal (bigger) probability ? E.g.: “我 吃 红烧肉” vs. “我 吃 红小豆” Equivalent word class (word clustering) ? How ? Estimation of probability where there is new word E.g.: “我 吃 萝卜” vs. “我 吃 [火锅]” Computation and storage: O(W N) word clustering: by data ? by rule ? Speech Lab., CST, THU

Framework for CSR (cont’d) The Language Processing (cont’d) Linguistic knowledge: rule-based Is it mature ? I mean for use in speech recognition ? How to use ? Lexicon ? Syntax/grammar ? Semantics ? Sentential form ? ... Speech Lab., CST, THU

Framework for CSR (cont’d) The Language Processing (cont’d) Our Point of View Giving priority to the N-Gram based LM Using rule-base LM to smooth N-Gram probabilities Impossible 0 probability vs. sparseness caused 0 probability Word pairs in same grammar/semantics position LM Search on the basis of N-Gram probabilities Error locating and correcting on the basis of rules Estimating N-gram probabilities when new words added according to semantics knowledge Speech Lab., CST, THU

Framework for CSR (cont’d) The Important Robustness Issues Speaker Gender, accent, style (speed, loudness), stress, … Environment Background noises, microphone, channel, … Domain How to modify the N-Gram probabilities ? Speech Lab., CST, THU

Existing Applications Application directions Language Recognition Speaker Recognition S2E - Speaking Skill Evaluation Keyword Spotting Voice Command (Isolated SR) Dictating (Continuous SR) Speech Lab., CST, THU

Existing Applications (cont’d) Existing applications/products BigMouth English (大嘴英语) for Golden Disc Co., Beijing Speaking English As You Wish (随心所欲) for Human Co., Beijing Voice Command and Voice Phonebook for InstDict (快译通) product of Group Sense Ltd. (GSL), Hong Kong Voice Dialler (拨号器) for SoundTek Co., Guangdong EasyCmd: Voice Command Navigator EasyTalk: Chinese Dictation Machine Speech Lab., CST, THU

The Future Cooperation 已有技术 汉语分词平台(含词表) 未/已切分的语料库 不能成对的落单字对配搭信息 利用先验知识给出的规则统计结果,如中外姓名以及地名用字的统计规律和分析方法 Speech Lab., CST, THU

The Future Cooperation (cont’d) 可进一步合作的子项目 词的等价类分析,如基于词性 利用语言知识平滑统计概率 错误定位算法(利用现有校对系统) 新加词身份确认 Speech Lab., CST, THU

Thanks for Your Patience Beijing Polytechnic University, Jan. 1999 Speech Lab., CST, THU