Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪.

Slides:



Advertisements
Similar presentations
1 生物計算期末作業 暨南大學資訊工程系 2003/05/13. 2 compare f1 f2  只比較兩個檔案 f1 與 f2 ,比完後將結果輸出。 compare directory  以兩兩比對的方式,比對一個目錄下所有檔案的相 似程度。  將相似度很高的檔案做成報表輸出,報表中至少要.
Advertisements

第七章 抽樣與抽樣分配 蒐集統計資料最常見的方式是抽查。這 牽涉到兩個問題: 抽出的樣本是否具有代表性?是否能反應出母體的特徵?
Section 1.2 Describing Distributions with Numbers 用數字描述分配.
Chapter 2 Random Vectors 與他們之間的性質 (Random vectors and their properties)
指導教授:陳淑媛 學生:李宗叡 李卿輔.  利用下列三種方法 (Edge Detection 、 Local Binary Pattern 、 Structured Local Edge Pattern) 來判斷是否為場景變換,以方便使用者來 找出所要的片段。
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 參 實驗法.
1 Simple Regression ( 簡單迴歸分析 ) Social Research Methods 2109 & 6507 Spring, 2006 March 8, 9, 13, 2006.
貨幣創造與控制 CHAPTER 27 學習本章後,您將能: C H A P T E R C H E C K L I S T 解釋銀行如何藉由放款而創造貨幣 1 解釋中央銀行如何影響貨幣數量 2.
消費者物價指數反映生活成本。當消費者物價指數上升時,一般家庭需要花費更多的金錢才能維持相同的生活水準。經濟學家用物價膨脹(inflation)來描述一般物價持續上升的現象,而物價膨脹率(inflation rate)為物價水準的變動百分比。
Section 2.3 Least-Squares Regression 最小平方迴歸
18 CHAPTER AS-AD 與景氣循環. 18 CHAPTER AS-AD 與景氣循環.
Section 2.2 Correlation 相關係數. 散佈圖 1 散佈圖 2 散佈圖的盲點 兩座標軸的刻度不同,散佈圖的外觀呈 現的相聯性強度,會有不同的感受。 散佈圖 2 相聯性看起來比散佈圖 1 來得強。 以統計數字相關係數做為客觀標準。
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
Wavelet transform and SPIHT 林明德. Wavelet transform & SPIHT Wavelet transform  濾波器組  程式功能  額外資訊 SPIHT  將不同功能的 SPIHT 做整合  用於各種長寬的圖檔  適用於 DSC 的 SPIHT.
Monte Carlo Simulation Part.2 Metropolis Algorithm Dept. Phys. Tunghai Univ. Numerical Methods C. T. Shih.
2009fallStat_samplec.i.1 Chap10 Sampling distribution (review) 樣本必須是隨機樣本 (random sample) ,才能代表母體 Sample mean 是一隨機變數,隨著每一次抽出來的 樣本值不同,它的值也不同,但會有規律性 為了要知道估計的精確性,必需要知道樣本平均數.
信度.
Department of Air-conditioning and Refrigeration Engineering/ National Taipei University of Technology 強健控制設計使用 MATLAB 李達生.
Robust Audio Tool (RAT) Speaker : Wei-Shin Pan DATE : 09/07/02.
Wireless Protocol Bluetooth
Matlab Assignment Due Assignment 兩個 matlab 程式 : Eigenface : Eigenvector 和 eigenvalue 的應用. Fractal : Affine transform( rotation, translation,
錄音筆,MP3 撥放器, 隨身碟 之原理及規格. 定義 錄音筆 – 以錄音為首要功能 MP3 撥放器 – 以播放音樂為首要功能 隨身碟 – 以行動碟為功能.
第二章 供給與需求 中興大學會計學系 授課老師:簡立賢.
Chapter 8 消費可能性 偏好 選擇 Part 3 家庭的選擇
第三章 自動再裝載運用篇 使用時機:裝載計劃完成時,尚有剩餘空 間的情形,維持已固定計劃而繼續做裝載 最佳化。以支持次日裝載計劃而提前調整 作業模式。 裝載物品設定和裝載容器設定如前兩章介 紹,於此不再重複此動作,直接從裝載計 劃設定開始,直接從系統內定的物品和容 器選取所需.
Fourier Series. Jean Baptiste Joseph Fourier (French)(1763~1830)
A Study on PNS and Block Length Switching in MPEG-4 Audio Coding 電通所 碩二 研究生 : 游政勳 指導教授 : 尤信程 老師.
: Playing War ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11061: Playing War 解題者:陳盈村 解題日期: 2008 年 3 月 14 日 題意:在此遊戲中,有一類玩家一旦開始攻擊, 就會不停攻擊同一對手,直到全滅對方或無法再.
短缺,盈餘與均衡. 遊戲規則  老師想出售一些學生喜歡的小食。  老師首先講出價錢,有興趣買的請舉手。
Fugacity Coefficient and Fugacity
公司加入市場的決定. 定義  平均成本 = 總成本 ÷ 生產數量 = 每一單位產量所耗的成本  平均固定成本 = 總固定成本 ÷ 生產數量  平均變動成本 = 總變動成本 ÷ 生產數量.
SPSS 分析簡介 何明洲 中山醫學大學心理系. 資料在 SPSS 上之排列 Between-subject design, one factor with three levels.
The application of boundary element evaluation on a silencer in the presence of a linear temperature gradient Boundary Element Method 期末報告 指導老師:陳正宗終身特聘教授.
Notes appear on slides 4, 6, and 12.
資料結構實習-一 參數傳遞.
Feature Motion for Monocular Robot Navigation. 單視覺機器人 – 追蹤 (tracking) 最常見的機器人導航技術 特徵點特性(特別 匹配性 抗破壞性) 特徵點取得(區塊 尺度不變)
演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.
1 第七章 風險管理 風險的要素 – 不確定 – 有損失的可能性. 2 風險的種類 事業風險 (business risk) 與財務風險 (financial risk) 可分散化風險 – 個別風險 – 非系統風險 – 純風險 不可分散化風險 – 市場風險 – 系統風險 – 投機 ( 或價格 ) 風險.
Density and control Reproduction curve 劉耀仁. Density :濃度、密度 ► 1 號區是 High-Density 區、 2 號是 Middle-density 區、 3 號區是 Low- Density 區。 ► 計算反射率( R )或透射率( T )
845: Gas Station Numbers ★★★ 題組: Problem Set Archive with Online Judge 題號: 845: Gas Station Numbers. 解題者:張維珊 解題日期: 2006 年 2 月 題意: 將輸入的數字,經過重新排列組合或旋轉數字,得到比原先的數字大,
廣電新聞播報品質電腦化 評估系統之研發 國立政治大學 資訊科學系 指導教授:廖文宏 學生:蘇以暄.
Learning Method in Multilingual Speech Recognition Author : Hui Lin, Li Deng, Jasha Droppo Professor: 陳嘉平 Reporter: 許峰閤.
The effect of task on the information-related behaviors of individuals in a work-group environment. The effect of task on the information-related behaviors.
Kinetic Model of Gases Section 1.9, Assumptions A gas consists of molecules in ceaseless random motion The size of the molecules is negligible in.
Notes appear on slides 3, 4, 5, and 6.
The Advantages Of Elliptic Curve Cryptography For Wireless Security Computer and Information Security 資工四 謝易霖.
整合性封包保護機制提升語音 通訊之品質 Ren-Yuh Lu. Outline Introduction –MANET –Motivation & Objective –Problem Description Related Work –Reliable Blast UDP –Partial-Reliable.
1/17 A Study on Separation between Acoustic Models and Its Application Author : Yu Tsao, Jinyu Li, Chin-Hui Lee Professor : 陳嘉平 Reporter : 許峰閤.
專題成果報告 胺基酸功能預測開發環境 指導教授:歐昱言 邱彥豪 邱顯鈞.
整合性封包保護機制提升語音 通訊之品質 Ren-Yuh Lu. Outline Introduction –MANET –Motivation & Objective –Problem Description Related Work –Reliable Blast UDP –Partial-Reliable.
第五章IIR數位濾波器設計 濾波器的功能乃對於數位信號進行處理﹐ 以滿足系統的需求規格。其作法為設計一 個系統的轉移函數﹐或者差分方程式﹐使 其頻率響應落在規格的範圍內。本章探討 的是其中一種方法﹐稱為Infinite impulse register(IIR)。 IIR架構說明。 各種不同頻帶(Band)濾波器的設計方法。
牽涉兩個變數的 Data Table 汪群超 11/1/98. Z=-X 2 +4X-Y 2 +6Y-7 觀察 Z 值變化的 X 範圍 觀察 Z 值變化的 Y 範圍.
: Finding Paths in Grid ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11486: Finding Paths in Grid 解題者:李重儀 解題日期: 2008 年 10 月 14 日 題意:給一個 7 個 column.
:Problem E.Stone Game ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 10165: Problem E.Stone Game 解題者:李濟宇 解題日期: 2006 年 3 月 26 日 題意: Jack 與 Jim.
財務管理概論 劉亞秋‧薛立言 合著 (東華書局, 2007)
幼兒行為觀察與記錄 第八章 事件取樣法.
Representing Acoustic Information
Topics covered in this chapter
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Learning Wavelet Transform by MATLAB Toolbox Professor : R.J. Chang Student : Tsung-Lin Wu Date :2012/12/14.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CHAPTER 6 Frequency Response, Bode Plots, and Resonance.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Ch 8 實習.
3 圓周運動指考講解. 例題 年 指考 v r=16(m) 一汽車開在曲率半徑為 16 m 的彎曲水平 路面上,車胎與路面的靜摩擦係數為 0.40 ,動摩擦係數為 0.20 ,取重力加速 度為 10 m / s 2 ,則汽車在此道路上能等 速安全轉彎而不打滑的最大速率約為下 列何者? (A)
Flexible Discriminant Analysis by Optimal Scoring
Presentation transcript:

Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪

2 Outline Linear Prediction Coding Mel-scale Frequency Cepstral Coefficients Perceptual Linear Predictive

3 Introduction Feature Extraction –Speech Production Model Linear Prediction Coding –Speech Perception Model Mel-scale Frequency Cepstral Coefficients

4 Linear Prediction Coding Property –Approximates the areas of high-energy concentration while smoothing out the fine harmonic structure and other less-relevant spectral details. –The approximated high-energy spectral areas often correspond to the resonance frequencies of the vocal tract (formants).

5 Linear Prediction Coding Autocorrelation –Levinson-Durbin Recursion Impulse Response Time domain Frequency domain SpeechLPCSpeech and LPC

6 Linear Prediction Coding Disadvantage –LPC approximates speech equally well at all frequencies of the analysis band. This property is inconsistent with human hearing. Beyond about 800Hz, the spectral resolution of hearing decreases with frequency. –The amplitude levels typically encountered in conversational speech, hearing is more sensitive in the middle frequency range of the audible spectrum. –The spectral details of speech are not always preserved or discarded by LPC analysis according to their auditory prominence.

7 Mel-scale Frequency Cepstral Coefficients Mel-scale – 在低頻部分, 人耳感受是比較敏銳 – 在高頻部分, 人耳的感受就會越來越粗糙 – 人耳對於頻率的感受事呈對數變化的

8 Mel-scale Frequency Cepstral Coefficients

9 Discrete cosine transform – 由 frequency domain 轉回 time domain –frequency of frequency

10 MFCC & LPC Mel-scale Frequency Cepstral Coefficients –Advantage 強調語音頻譜上的特性, 即使在有雜訊干擾的環境下, 仍能維持較佳的 辨識率 –Disadvantage 運算量較大 Linear Prediction Coding –Advantage 運算量小 –Disadvantage 未考慮語音頻譜上的特性, 辨識率隨著雜訊增加而下降

11 Perceptual Linear Predictive MFCC LPC

12 Perceptual Linear Predictive Equal-Loudness Preemphasis

13 Perceptual Linear Predictive Equal-Loudness Preemphasis (count.) – 與預強的效果相同 ? Frequency domain

14 Perceptual Linear Predictive Intensity-Loudness Power Law – Frequency domain

15 Perceptual Linear Predictive Intensity-Loudness Power Law (count.) –Power spectrum 不需要再開平方 ek = (float)sqrt((double)(t1*t1 + t2*t2)); –Filter bank 後的值不需要取 log bins[bin] = log((double)t1);

16 Perceptual Linear Predictive Inverse Discrete Fourier Transform – 由 frequency domain 轉回 time domain Frequency domainTime domain

17 Perceptual Linear Predictive Autoregressive Modeling (LPC) Time domain

18 Experiment 369 MFCC PLP_ PLP_ PLP_ PLP_ *PLP_

19 Thanks

20 Thanks

21 Choice Of The Order Of The Autoregressive PLP Model Introduction Spectral distortion measure of PLP Single-frame phoneme identification Isolated-word identification

22 Choice Of The Order Of The Autoregressive PLP Model Introduction –With increasing model order the spectrum of the all-pole model asymptotically approaches the auditory spectrum.

23 Choice Of The Order Of The Autoregressive PLP Model Spectral Distortion Measure of PLP –group-delay distortion measure The spectral peaks of the model are enhanced and its spectral slope is suppressed. The group-delay metric is more sensitive to distance between narrow peaks. The group-delay measure is more sensitive to the actual value of the spectral peak width. –Exponential measure Allows for various degrees of peak enhancement.

24 Single-Frame Phoneme Identification –As is evident, the PLP identification accuracy increases up to about the 5th order of the autoregressive model and then starts decreasing with further increases in the model order. Choice Of The Order Of The Autoregressive PLP Model

25 Choice Of The Order Of The Autoregressive PLP Model Isolated-Word Identification

26 Choice Of The Order Of The Autoregressive PLP Model Discussion –The advantage of the PLP over the LP is that it allows for the effective suppression of the speaker-dependent information by choosing the particular model order. –The linguistically relevant speaker-independent cues lie in the gross shape of the auditory spectrum. This gross shape can be characterized by the one or two spectral peaks of the 5 th -order PLP model.

27 PLP and Human Hearing Introduction Formant Frequency Changes Sensitivity to Bandwidth Changes Sensitivity to Spectral Tilt Sensitivity to F0 Discussion

28 PLP and Human Hearing Introduction –The first three formant frequencies is approximately constant in relative frequency. The LP analysis is in conflict with it.

29 PLP and Human Hearing Formant Frequency Changes

30 PLP and Vowel Perception Introduction The effective second formant Spectral peak integration theory The significance of the bandwidth B2 Discussion

31