Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition Pengrui Wang, Jie Li, Bo Xu Interactive Digital.

Slides:

Advertisements

Similar presentations

Hybrid Context Inconsistency Resolution for Context-aware Services

Advertisements

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!

Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.

Isolated-Word Speech Recognition Using Hidden Markov Models

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

C - IT Acumens. COMIT Acumens. COM. To demonstrate the use of Neural Networks in the field of Character and Pattern Recognition by simulating a neural.

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

NTNU SPEECH AND MACHINE INTELEGENCE LABORATORY Discriminative pronunciation modeling using the MPE criterion Meixu SONG, Jielin PAN, Qingwei ZHAO, Yonghong.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

Survey on state-of-the-art approaches: Neural Network Trends in Speech Recognition Survey on state-of-the-art approaches: Neural Network Trends in Speech.

A NONPARAMETRIC BAYESIAN APPROACH FOR

Olivier Siohan David Rybach

Automatic Speech Recognition

Date: October, Revised by 李致緯

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Online Multiscale Dynamic Topic Models

Deep Learning Amin Sobhani.

Yannis Flet-Berliac Tengyu Zhou Maciej Korzepa Gandalf Saxe

Hierarchical Multi-Stream Posterior Based Speech Recognition System

Juicer: A weighted finite-state transducer speech decoder

Pick samples from task t

Matt Gormley Lecture 16 October 24, 2016

Conditional Random Fields for ASR

CSC 594 Topics in AI – Natural Language Processing

Natural Language Processing of Knee MRI Reports

A critical review of RNN for sequence learning Zachary C

Grid Long Short-Term Memory

CRANDEM: Conditional Random Fields for ASR

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems

Jeremy Morris & Eric Fosler-Lussier 04/19/2007

Automatic Speech Recognition: Conditional Random Fields for ASR

Code Completion with Neural Attention and Pointer Networks

Neural Speech Synthesis with Transformer Network

Handwritten Characters Recognition Based on an HMM Model

Research on the Modeling of Chinese Continuous Speech Recognition

Speech recognition, machine learning

Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee

Natural Language Processing (NLP) Systems Joseph E. Gonzalez

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

3. Adversarial Teacher-Student Learning (AT/S)

Automatic Handwriting Generation

2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.

Speech recognition, machine learning

Listen Attend and Spell – a brief introduction

Presentation transcript:

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition Pengrui Wang, Jie Li, Bo Xu Interactive Digital Media Technology Research Center Institute of Automation, Chinese Academy of Sciences, Beijing, China wangpengrui2015@ia.ac.cn

Outline Intention and work Brief review of CTC function and search graphs Experiments Summary

Intention and Work Intention Our Work To improve the CTC-based end-to-end ASR system on Chinese Mandarin Whether CTC-trained CD-Phn model match the hybrid CD states model on Chinese Mandarin? Our Work Three different level output units characters (Chars), context independent phonemes (CI-Phns), context dependent phonemes (CD-Phns) Training strategy and posterior normalization Implement of UniLSTM with row convolution

Review of CTC Observation sequence X = (x1，…, x2, …, xT) In ASR, CTC has the ability to learn the alignments between speech frames and their transcript label sequences Observation sequence X = (x1，…, x2, …, xT) Symbol sequence z = (z1, …, z2, …, zU ) (U≤T) HMM-like model in CTC function ( z = (A,B,B) ) Pr(z|X) is quickly calculated by forward-backward algorithm

WFST in CTC Three types of the search graphs Schars = T ◦ min( det (Ls ◦ G)) SCI-Phns = T ◦ min( det (Lp ◦ G)) SCD-Phns = T ◦ min( det (C ◦ (Lp ◦ G))) G: grammar WFST, Ls: spelling WFST, Lp: phoneme-lexicon WFST, C: CD-Phn to CI-Phn WFST Spelling WFST

WFST in CTC Character token WFST CD-phn token WFST Consume blank labels map tied CD-Phns to untied CD-Phns

Experiments Setup Feature：40-dimensional (LFB) LSTM： 3 hidden layers， 800 memory cells Max-Norm Regularization（1.0）, limit the gradient （-50， 50） Corpus: HKUST Set training development testing calls 851 22 24

Experiments Learning Rate Adjustment Strategy Newbob: Possible reason: Learning rate is halved whenever label accuracy drops. LAcc=1-LER （label error rate） Possible reason: development set has little ability to represent training set using CTC Solution Using “Newbob-Trn”, so that model can be trained more sufficiently

Experiments Blank Label Prior Cost A decoder is better to satisfy del ≈ 2 * ins Blank label prior is large WER（%） ins del sub BlankPrior*1 36.96 3456 1638 15658 BlankPrior*0.2 33.48 1852 2748 14200 BlankPrior*0.1 33.10 1462 3497 13626 BlankPrior*0.05 33.32 988 4909 12811

Experiments Baseline (Hybrid model) This work Char model (end-to-end) performs well CD-Phns model outperforms hybrid CD states model

Experiments UniLSTM with row convolution Three output units all have performance gain UniLSTM-RC model even match BiLSTM model It is useful for online recognition system

frame 1-85 86-107 108-219 220-243 244-253 254-261 262-267 268-284 285-301 302-329 330-380 char SIL 呃我觉得他挺好的

Summary Three different level output units are explored: Chars, CI-Phns and CD-Phns Improve the training strategy and posterior normalization Propose Newbob-Trn strategy to make training stable and adequate Add extra cost on blank label prior when decoding Establish the CTC-trained UniLSTM-RC model which ensures the real-time requirement of a online system, meanwhile, brings performance gain compared with UniLSTM model