Prof. Lin-shan Lee TA. Roy Lu

Slides:

Advertisements

Similar presentations

數位語音處理概論 HW#2-1 HMM Training and Testing

Advertisements

專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.

CIS101 Introduction to Computing

專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.

專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.

CLIENTS Advantages Setup Features. Advantages Allows programs and websites to use “default ” Scanners Picture Programs Word Processing programs.

Lecturer: Ghadah Aldehim

12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.

MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.

1 Intro to Linux - getting around HPC systems Himanshu Chhetri.

1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.

Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.

Digital Speech Processing Homework 3

17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.

DSP homework 1 HMM Training and Testing

1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.

Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.

Welcome to the World of Wikis Classroom Management: An Evidence-Based Independent Study Instructions for Navigating the wiki and submitting projects.

Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.

By The Supreme Team CMPT 275 Assignment 2 May 29, 2009.

Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab

專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.

專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding

STAAR ® Assessment Management System Training OCT 2, 5, & 6, 2015 October 2015 TEXAS EDUCATION AGENCY - STUDENT ASSESSMENT DIVISION 1.

Digital Speech Processing HW3

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.

Building the Wiki together A training manual. The page of the Economic Justice Working Group.

Technical lssues for the Knowledge Engineering Competition Stefan Edelkamp Jeremy Frank.

HW2-2 Speech Analysis TA: 林賢進

Digital Speech Processing Homework /05/04 王育軒.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

BIG SALES GUIDE TO UPDATE PRODUCT PRICE IN BULK. Programs you need to have: LibreOffice Download Link, Click HERE !

(Required for DTCs, Recommended for STCs)

Scan, Import, and Automatically file documents to Box Introduction

Automatic Speech Recognition

Development Environment

Technology retreat 2013 – Misty Calhoun

Date: October, Revised by 李致緯

Prof. Lin-shan Lee TA. Roy Lu

Juicer: A weighted finite-state transducer speech decoder

For Evaluating Dialog Error Conditions Based on Acoustic Information

Dr. ElSayed Eissa Hemayed

專題研究 week3 Language Model and Decoding

Prof. Lin-shan Lee TA. Lang-Chi Yu

Digital Speech Processing

UNIT 15 Webpage Creator.

Hotmail Sign in Error, , Hotmail Login Support

Technical Support Overview and Training

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Adding and Editing Users

EEG Recognition Using The Kaldi Speech Recognition Toolkit

Jeremy Morris & Eric Fosler-Lussier 04/19/2007

Automatic Speech Recognition: Conditional Random Fields for ASR

Prof. Lin-shan Lee TA. Po-chun, Hsu

專題研究 WEEK 5 – Deep Neural Networks in Kaldi

2007 SPEECH PROJECT PRESENTATION

Voice Activation for Wealth Management

Introduction to “TEAMS”

Skype For Business Introduction

專題研究 WEEK 5 – Deep Neural Networks in Kaldi

專題進度報告資工四 B 洪志豪資工四 B 林宜鴻.

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

STANDARD ACCOUNT: SOLUTION QUICK GUIDE

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Chloe Riley | Research Commons Librarian |

Presentation transcript:

Prof. Lin-shan Lee TA. Roy Lu (R05942070@ntu.edu.tw) 專題研究 WEEK 4 - LIVE DEMO Prof. Lin-shan Lee TA. Roy Lu (R05942070@ntu.edu.tw)

Outline Review Live Demo introduction Restriction TODO To think FAQ

Review: achieving ASR system Automatic Speech Recognition System Input wave -> output text

Review Week 1: feature extraction compute-mfcc-feat add-delta compute-cmvn-stats apply-cmvn File format: scp,ark

Review Week 2: training acoustic model monophone clustering tree triphone Models: final.mdl, tree

Review Week 3: decoding and training lm SRILM( ngram-count/ kn-smoothing ) Kaldi – WFST decoding HTK – Viterbi decoding Vulcan( kaldi format -> HTK format ) Models: final.mmf tiedlist

Live Demo Now we integrated them into a real-world ASR system for you. You could upload your own models. Now give a shot! Experience your own ASR in a “real” way.

Live Demo https://140.112.21.35:54285/ Demo Remember: use https. Ignore the warnings.

Restriction MFCC with dim 39 only. Fixed phone set. (Chinese phones) LM must be one of unigram/bigram/trigram model.

To Do Sign up Live Demo with your account. Please inform TA of your account name for activation. FB Group: 數位語音專題 Test with basic model embedded in the system. Upload your model LM/LEX/TREE/MDL For better performance, you may re-train your models. Test with your own models.

To Think Compare the basic models with your own models, what is the main difference? Do you know of what kind your training data are? train.text/dev.text/test.text How about manually tagging your own lexicon and train your own language model? Guess about the training data of the basic models. Exemplify your description.

Language Model : Training Text (2/2) cut -d ' ' -f 1 --complement $train_text > ./exp/lm/LM_train.text

Language Model : ngram-count (3/3) Lexicon lexicon=material/lexicon.train.txt

FAQ Q: How to download the models in the workstation? A: FileZilla MobaXterm “sftp” or “scp” command in your linux OS.

FAQ Q: Why I always got server error? A: Make sure you got models uploaded. Is the timestamp empty?

FAQ Q: In corpus mode, why I always got error or 0 accuracy? A: Make sure your corpus is written under UTF-8 encoding. In notepad, the default is ANSI. In vim, the default is UTF-8.

FAQ Q: In corpus mode, why I got negative accuracy? A: Accuracy is actually calculated by (length – error ) / length.

Q&A This system is just online. Any bug is expected. If you got any question, contact TA through FB group or email instantly. We need you feedback about UI/function. Feel free saying about anything. Email:R05942070@ntu.edu.tw