Presentation is loading. Please wait.

Presentation is loading. Please wait.

ASR System & LIBDNN Yen-Chen Wu

Similar presentations


Presentation on theme: "ASR System & LIBDNN Yen-Chen Wu"— Presentation transcript:

1 ASR System & LIBDNN Yen-Chen Wu r03942044@ntu.edu.tw
台大語音實驗室 暑期專題研究 ASR System & LIBDNN Yen-Chen Wu

2 Outline DNN in Speech Recognition DNN TIMIT Introduction
How to use libdnn

3 DNN IN SPEECH RECOGNITION

4 Speech Recognition In speech processing…
each word consists of syllables each syllable consists of phonemes Each time frame, with an observance (vector) mapped to a phoneme. “青色” → “青(ㄑㄧㄥ)色(ㄙㄜ、)” → ”ㄑ” (syllables) 青:TSI --I –N (phonemes) (phonemes)

5 Observation Sequences
Sample Rate: 16000 Observation Sequences 10 ms 25 ms sliding window frames of features Digital Speech Processing Lect. 2.0 Frame 1 Frame 2 Frame 3

6 DNN in Speech Recognition
Goal: predict phoneme given feature in each time frame. Frame-wise prediction Input: acoustic features MFCC, FBANK or... Output: pronunciation units Phonemes or... To know more about Automatic Speech Recognition(ASR), please refer to

7 Training Deep Neural Network

8

9 Main Problems Model initialize Feedforward Backpropagate Update
Predict

10 Model Initialize DNN sometimes fails at local optimum problem, so initialization matters. Practically, there exists unsupervised pre-training technique on initialization. However, in this homework, we recommend you initialize them randomly for the simplicity and efficiency.

11 Feedforward

12 Backpropagate

13 Update

14 Evaluation Framewise phoneme prediction Frame Accuracy

15 WHY DNN? Basic Model in Deep Learning Network Structure
Feature Extraction (Representation) Variety of Structures (CNN, RNN, LSTM, NTM…etc) Network Structure How many layers? Number of neurons in each layer Training Parameter Learning Rate Batch Size

16 Dataset and Format

17 Dataset TIMIT(Texas Instrument and Massachusetts Institute of Technology) Well-transcribed speech of American English speakers of different sexes and dialects. Designed for the development and evaluation of ASR systems.

18 Dataset Each instance consists of 3 parts:
speaker faem0, sentence si1392, the 37th frame

19 Data Format WAV file: Speak-Sentence ID + .wav
Check by your ear(s) ARK file: Instance ID + features TODO

20 HOW TO USE LIBDNN

21 LIBDNN libdnn 是一個輕量、好讀、人性化的深層學習函式 庫。由 C++ 和 CUDA 撰寫而成,目的是讓開發人 員、研究人員、或任何有興趣的人都可以輕鬆體驗 並駕馭深層學習所帶來的威力。 Ref: 以深層與卷積類神經網路建構聲學模型之大字彙連續 語音辨識 ( Deep and Convolutional Neural Networks for Acoutic Modeling in Large Vocabulary Continuous Speech Recognition ) 已安裝於專題生工作站

22 資料格式(一) 稀疏矩陣( LibSVM ) 這個向量大部分的值都是0,只有少數幾維的值為1

23 資料格式(二) 緊密排列的方式(dense) 本次練習給的格式

24 如何使用 主要有以下三個程式: 會將指令寫成shell-script 直接修改參數即可 nn-init nn-train
nn-init [train_set_file] <-o> <--input-dim> <--struct> [options] EX: nn-init -o init.model --input-dim 69 --struct output-dim 39 nn-train nn-train <training_set_file> <model_in> [valid_set_file] [model_out] <--input-dim> [options] EX: nn-train train.dat init.model --input-dim 69 nn-predict nn-predict <testing_set_file> <model_file> [output_file] <--input- dim> [options] EX: nn-predict test.dat train.model --input-dim 69 會將指令寫成shell-script 直接修改參數即可

25 WORK STATION 專題生開工作站帳號請找 ssh -p 2822 your_account@140.112.21.35
實驗室網管: 廖宜修 ssh -p 2822 進入工作站後先確認data位置 /home/wyc2010/DNN_practice 複製run.sh回到自己的家目錄 cp /home/wyc2010/DNN_practice/run.sh 開始實驗! sh run.sh


Download ppt "ASR System & LIBDNN Yen-Chen Wu"

Similar presentations


Ads by Google