Prof. Lin-shan Lee TA. Roy Lu

Slides:



Advertisements
Similar presentations
數位語音處理概論 HW#2-1 HMM Training and Testing
Advertisements

專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.
CIS101 Introduction to Computing
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.
CLIENTS Advantages Setup Features. Advantages Allows programs and websites to use “default ” Scanners Picture Programs Word Processing programs.
Lecturer: Ghadah Aldehim
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
1 Intro to Linux - getting around HPC systems Himanshu Chhetri.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Digital Speech Processing Homework 3
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
DSP homework 1 HMM Training and Testing
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Welcome to the World of Wikis Classroom Management: An Evidence-Based Independent Study Instructions for Navigating the wiki and submitting projects.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
By The Supreme Team CMPT 275 Assignment 2 May 29, 2009.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.
專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding
STAAR ® Assessment Management System Training OCT 2, 5, & 6, 2015 October 2015 TEXAS EDUCATION AGENCY - STUDENT ASSESSMENT DIVISION 1.
Digital Speech Processing HW3
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Building the Wiki together A training manual. The page of the Economic Justice Working Group.
Technical lssues for the Knowledge Engineering Competition Stefan Edelkamp Jeremy Frank.
HW2-2 Speech Analysis TA: 林賢進
Digital Speech Processing Homework /05/04 王育軒.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
BIG SALES GUIDE TO UPDATE PRODUCT PRICE IN BULK. Programs you need to have: LibreOffice Download Link, Click HERE !
(Required for DTCs, Recommended for STCs)
Scan, Import, and Automatically file documents to Box Introduction
Automatic Speech Recognition
Development Environment
Technology retreat 2013 – Misty Calhoun
Date: October, Revised by 李致緯
Prof. Lin-shan Lee TA. Roy Lu
Juicer: A weighted finite-state transducer speech decoder
For Evaluating Dialog Error Conditions Based on Acoustic Information
Dr. ElSayed Eissa Hemayed
專題研究 week3 Language Model and Decoding
Prof. Lin-shan Lee TA. Lang-Chi Yu
Digital Speech Processing
UNIT 15 Webpage Creator.
Hotmail Sign in Error, , Hotmail Login Support
Technical Support Overview and Training
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Adding and Editing Users
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
Prof. Lin-shan Lee TA. Po-chun, Hsu
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
2007 SPEECH PROJECT PRESENTATION
Voice Activation for Wealth Management
Introduction to “TEAMS”
Skype For Business Introduction
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
專題進度報告 資工四 B 洪志豪 資工四 B 林宜鴻.
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
STANDARD ACCOUNT: SOLUTION QUICK GUIDE
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Chloe Riley | Research Commons Librarian |
Presentation transcript:

Prof. Lin-shan Lee TA. Roy Lu (R05942070@ntu.edu.tw) 專題研究 WEEK 4 - LIVE DEMO Prof. Lin-shan Lee TA. Roy Lu (R05942070@ntu.edu.tw)

Outline Review Live Demo introduction Restriction TODO To think FAQ

Review: achieving ASR system Automatic Speech Recognition System Input wave -> output text

Review Week 1: feature extraction compute-mfcc-feat add-delta compute-cmvn-stats apply-cmvn File format: scp,ark

Review Week 2: training acoustic model monophone clustering tree triphone Models: final.mdl, tree

Review Week 3: decoding and training lm SRILM( ngram-count/ kn-smoothing ) Kaldi – WFST decoding HTK – Viterbi decoding Vulcan( kaldi format -> HTK format ) Models: final.mmf tiedlist

Live Demo Now we integrated them into a real-world ASR system for you. You could upload your own models. Now give a shot! Experience your own ASR in a “real” way.

Live Demo https://140.112.21.35:54285/ Demo Remember: use https. Ignore the warnings.

Restriction MFCC with dim 39 only. Fixed phone set. (Chinese phones) LM must be one of unigram/bigram/trigram model.

To Do Sign up Live Demo with your account. Please inform TA of your account name for activation. FB Group: 數位語音專題 Test with basic model embedded in the system. Upload your model LM/LEX/TREE/MDL For better performance, you may re-train your models. Test with your own models.

To Think Compare the basic models with your own models, what is the main difference? Do you know of what kind your training data are? train.text/dev.text/test.text How about manually tagging your own lexicon and train your own language model? Guess about the training data of the basic models. Exemplify your description.

Language Model : Training Text (2/2) cut -d ' ' -f 1 --complement $train_text > ./exp/lm/LM_train.text

Language Model : ngram-count (3/3) Lexicon lexicon=material/lexicon.train.txt

FAQ Q: How to download the models in the workstation? A: FileZilla MobaXterm “sftp” or “scp” command in your linux OS.

FAQ Q: Why I always got server error? A: Make sure you got models uploaded. Is the timestamp empty?

FAQ Q: In corpus mode, why I always got error or 0 accuracy? A: Make sure your corpus is written under UTF-8 encoding. In notepad, the default is ANSI. In vim, the default is UTF-8.

FAQ Q: In corpus mode, why I got negative accuracy? A: Accuracy is actually calculated by (length – error ) / length.

Q&A This system is just online. Any bug is expected. If you got any question, contact TA through FB group or email instantly. We need you feedback about UI/function. Feel free saying about anything. Email:R05942070@ntu.edu.tw