Download presentation
Presentation is loading. Please wait.
Published byBlaise Evans Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A review of large-vocabulary continuous-speech recognition Advisor : Dr. Hsu Graduate : Yu-Cheng Chen IEEE SIGNAL PROCESSING MAGAZIZE 1996
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction System Overview Front-End Parameterization Acoustic Modeling Language Modeling Decoder Current LVR Current Issues Conclusion Personal Opinion
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Introduction Large Vocabulary Recognition has considerable progress. Capable of transcribing continuous speech from any speaker with average word error rates between 5% and 10% If speaker adaptation is allowed, the error rate will below 5% We will discuss the principles and architecture of LVR system Cambridge University HTK system
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 System Overview X and Y are certain utterance and an acoustic vector. The job of LVR system is to find the most probable word sequence W given the observed acoustic signal Y
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 System Overview
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Front-End Parameterization Its main function is to divide the input speech into blocks and from each block derive a smoothed spectral estimate. The Mel-scale is designed to get better frequency resolution of the human ear
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 快速傅利葉轉換 FFT :由於訊號在時域( Time domain )上的變化通常 很難看出訊號的特性,所以通常將它轉換成頻域( Frequency domain ) 上的能量分佈來觀察,不同的能量分佈,就能代表不同語音的特性。 梅爾三角過濾器:對頻譜進行平滑化,並消除諧波的作用,凸顯原先語 音的共振峰並降低資料量。 Log :將乘法轉換為加法並減少的誤差 離散餘弦轉換:期望能轉回類似 Time Domain 的情況來看
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Acoustic Modeling The acoustic models is to calculate the likelihood of any vector Y given a word w Word sequences are decomposed into basic sounds called phones. Each phone is represented by a hidden Markov model (HMM).
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Acoustic Modeling 以數字「九」的發音為例九 Acoustic vector HMM
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Acoustic Modeling 我們通常以矩陣 A 來表示轉移機率, A(i, j) 即是指由狀態 i 跳到狀態 j 的機率值。例如在上圖中,由狀態 1 跳到狀態 2 的機率是 0.3 ,因此 A(1, 2) = 0.3 。 我們通常以矩陣 B 來表示狀態機率, B(i, j) 即是指音框 i 隸屬 於狀態 j 的機率值。
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Acoustic Modeling So far we assume that only one HMM is required per phone, however, contextual effects cause large variations in the way that different sounds are produced. For example, “Beat it”, would be represented by the phone sequence “ sil b iy t ih t sil ” We use triphones where every phone has a distinct HMM model. sil sil-b+iy b-iy+t iy-t+ih t-ih+t ih-t+sil sil It gives the best accuracy but leads to complication.
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Acoustic Modeling The problem of too many parameters is crucial in the statistical speech recognizer. The solution is to form a pool which was shared among all HMM states.
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Acoustic Modeling The problem of too many parameters is crucial in the statistical speech recognizer. The solution so-called tied-mixture system is to form a pool which was shared among all HMM states.
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Acoustic Modeling We use the phonetic decision tress to choose which states is tied.
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Its purpose is to estimate the probability of some word w k in an utterance given the preceding words w 1 …w k-1 N-grams Assume wk depends only on the preceding n-1 words If N=3 Language Modeling
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 In order to perform recognition, the word W must be found which maximize equation 1. This is a search problem and its solution is decoder. Two main approaches: depth-first and breadth-first Decoding
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Decoding
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 The performance of HTK LV Recognizer Current State of LVR
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Speaker Adaptation Environmental Robustness Task Independence Spontaneous Speech Real Time Operation Current Issues
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Conclusions This paper has reviewed the man components of a continuous speech large vocabulary recognition system.
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Personal Opinion Just know the overview Much more needs to be done before robust, general- purpose LVR.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.