ASR System & LIBDNN Yen-Chen Wu

ASR System & LIBDNN Yen-Chen Wu r03942044@ntu.edu.tw
台大語音實驗室暑期專題研究 ASR System & LIBDNN Yen-Chen Wu

Outline DNN in Speech Recognition DNN TIMIT Introduction
How to use libdnn

DNN IN SPEECH RECOGNITION

Speech Recognition In speech processing…
each word consists of syllables each syllable consists of phonemes Each time frame, with an observance (vector) mapped to a phoneme. “青色” → “青(ㄑㄧㄥ)色(ㄙㄜ、)” → ”ㄑ” (syllables) 青：TSI --I –N (phonemes) (phonemes)

Observation Sequences
Sample Rate: 16000 Observation Sequences 10 ms 25 ms sliding window frames of features Digital Speech Processing Lect. 2.0 Frame 1 Frame 2 Frame 3

DNN in Speech Recognition
Goal: predict phoneme given feature in each time frame. Frame-wise prediction Input: acoustic features MFCC, FBANK or... Output: pronunciation units Phonemes or... To know more about Automatic Speech Recognition(ASR), please refer to

Training Deep Neural Network

Main Problems Model initialize Feedforward Backpropagate Update
Predict

Model Initialize DNN sometimes fails at local optimum problem, so initialization matters. Practically, there exists unsupervised pre-training technique on initialization. However, in this homework, we recommend you initialize them randomly for the simplicity and efficiency.

Feedforward

Backpropagate

Update

Evaluation Framewise phoneme prediction Frame Accuracy

WHY DNN? Basic Model in Deep Learning Network Structure
Feature Extraction (Representation) Variety of Structures (CNN, RNN, LSTM, NTM…etc) Network Structure How many layers? Number of neurons in each layer Training Parameter Learning Rate Batch Size

Dataset and Format

Dataset TIMIT(Texas Instrument and Massachusetts Institute of Technology) Well-transcribed speech of American English speakers of different sexes and dialects. Designed for the development and evaluation of ASR systems.

Dataset Each instance consists of 3 parts:
speaker faem0, sentence si1392, the 37th frame

Data Format WAV file: Speak-Sentence ID + .wav
Check by your ear(s) ARK file: Instance ID + features TODO

HOW TO USE LIBDNN

LIBDNN libdnn 是一個輕量、好讀、人性化的深層學習函式庫。由 C++ 和 CUDA 撰寫而成，目的是讓開發人員、研究人員、或任何有興趣的人都可以輕鬆體驗並駕馭深層學習所帶來的威力。 Ref: 以深層與卷積類神經網路建構聲學模型之大字彙連續語音辨識 ( Deep and Convolutional Neural Networks for Acoutic Modeling in Large Vocabulary Continuous Speech Recognition ) 已安裝於專題生工作站

資料格式(一) 稀疏矩陣（ LibSVM ）這個向量大部分的值都是0，只有少數幾維的值為1

資料格式(二) 緊密排列的方式(dense) 本次練習給的格式

如何使用主要有以下三個程式: 會將指令寫成shell-script 直接修改參數即可 nn-init nn-train
nn-init [train_set_file] <-o> <--input-dim> <--struct> [options] EX: nn-init -o init.model --input-dim 69 --struct output-dim 39 nn-train nn-train <training_set_file> <model_in> [valid_set_file] [model_out] <--input-dim> [options] EX: nn-train train.dat init.model --input-dim 69 nn-predict nn-predict <testing_set_file> <model_file> [output_file] <--input- dim> [options] EX: nn-predict test.dat train.model --input-dim 69 會將指令寫成shell-script 直接修改參數即可

WORK STATION 專題生開工作站帳號請找 ssh -p 2822 your_account@140.112.21.35
實驗室網管: 廖宜修 ssh -p 2822 進入工作站後先確認data位置 /home/wyc2010/DNN_practice 複製run.sh回到自己的家目錄 cp /home/wyc2010/DNN_practice/run.sh 開始實驗! sh run.sh

ASR System & LIBDNN Yen-Chen Wu

Similar presentations

Presentation on theme: "ASR System & LIBDNN Yen-Chen Wu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ASR System & LIBDNN Yen-Chen Wu

Similar presentations

Presentation on theme: "ASR System & LIBDNN Yen-Chen Wu"— Presentation transcript:

Similar presentations

About project

Feedback