Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Slides:



Advertisements
Similar presentations
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
CMSC Assignment 1 Audio signal processing
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Speaker Adaptation for Vowel Classification
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Pattern Recognition Feature Generation Linear Prediction Gaussian Mixture Models.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Eng. Shady Yehia El-Mashad
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
PATTERN COMPARISON TECHNIQUES
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Automated Detection of Speech Landmarks Using
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Leigh Anne Clevenger Pace University, DPS ’16
3. Applications to Speaker Verification
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
Presentation for EEL6586 Automatic Speech Processing
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presented by Chen-Wei Liu
Keyword Spotting Dynamic Time Warping
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India Received 8 December 2011; received in revised form 12 March 2012; accepted 23 April 2012 Available online 1 June 2012 在頻率時間 分類 送氣 塞音 清晰度 Chairman:Hung-Chi Yang Presenter: Yue-Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20

Outline Introduction MFCC 2D-DCT Polynomial surface

Outline GMM Results Conclusion

Introduction Automatic speech recognition (ASR) system The goal is the lexical content of the human voice is converted to a computer-readable input Attempt to identify or confirm issue voice speaker rather than the content of the terms contained therein 自動語音辨認 其目標是將人類的語音中的詞彙內容轉換為計算機可讀的輸入 後者嘗試識別或確認發出語音的說話人而非其中所包含的詞彙內容。 簡單的說就是讓機器聽的懂人說的話

Introduction Automatic speech recognition (ASR) system Acoustics feature Signal processing and feature extraction Mel frequency cepstral coefficients (MFCC) Acoustics model Statistically speech model Gaussian mixture model (GMM) 系統構成 聲學特徵 訊號處理 特徵的提取 聲學特徵的提取與選擇是語音識別的一個重要環節。 Mel倒譜係數 此參數考慮到人耳對不同頻率的感受程度,因此特別適合用在語音辨識。 聲學模型 以統計語音模型的方式建構 高斯混合模型 可以讓多個訊號平滑話在一起

MFCC Mel frequency cepstral coefficients (MFCC) MFCC takes human perception sensitivity with respect to frequencies into consideration, and therefore are best for speech/speaker recognition. 此參數考慮到人耳對不同頻率的感受程度,因此特別適合用在語音辨識。 所以是現在常用的 對人的聽覺機理的研究發現,當兩個頻率相近的音調同時發出時,人只能聽到一個音調。臨界頻寬指的就是這樣一種令人的主觀感覺發生突變的頻寬邊界,當兩個音調的頻率差小於臨界頻寬時,人就會把兩個音調聽成一個,這稱之為屏蔽效應。Mel刻度是對這一臨界頻寬的度量方法之一。

MFCC Pre-emphasis Frame blocking Hamming windowing The speech signal s(n) is sent to a high-pass filter Frame blocking Hamming windowing Each frame has to be multiplied with a hamming window in order to keep the continuity of the first and the last points in the frame 預強調 將語音訊號 s(n) 通過一個高通濾波器 音框化 漢明窗 將每一個音框乘上漢明窗,以增加音框左端和右端的連續性

MFCC Fast Fourier Transform or FFT Triangular Bandpass Filters The time domain signal into a frequency domain Triangular Bandpass Filters Smooth the magnitude spectrum such that the harmonics are flattened in order to obtain the envelop of the spectrum with harmonics. Discrete cosine transform or DCT 快速傅利葉轉換 時域信號轉化成頻域 三角帶通濾波器 對頻譜進行平滑化,並消除諧波的作用,突顯原先語音的共振峰。 離散餘弦轉換 問學長 MFCC的計算首先用FFT將時域信號轉化成頻域,之後對其對數能量譜用依照Mel刻度分布的三角濾波器組進行卷積,最後對各個濾波器的輸出構成的向量進行離散餘弦變換DCT,取前N個係數。PLP仍用德賓法去計算LPC參數,但在計算自相關參數時用的也是對聽覺激勵的對數能量譜進行DCT的方法。

MFCC Log energy Delta cepstrum The energy within a frame is also an important feature that can be easily obtained Delta cepstrum Actually used in speech recognition, we usually coupled differential cepstrum parameters to show the changes of the the cepstrum parameters of the time 對數能量 一個音框的音量(即能量),也是語音的重要特徵,而且非常容易計算 差量倒頻譜參數 而在實際應用於語音辨識時,我們通常會再加上差量倒頻譜參數,以顯示倒頻譜參數對時間的變化。 MFCC的計算首先用FFT將時域信號轉化成頻域,之後對其對數能量譜用依照Mel刻度分布的三角濾波器組進行卷積,最後對各個濾波器的輸出構成的向量進行離散餘弦變換DCT,取前N個係數。PLP仍用德賓法去計算LPC參數,但在計算自相關參數時用的也是對聽覺激勵的對數能量譜進行DCT的方法。

2D-DCT 2D-DCT modeling 二維的離散餘弦轉換 說明跟一維的差別 (在問學長)

Polynomial surface Polynomial surface modeling 多項式曲面模型 把它弄亂

Polynomial surface Polynomial surface modeling

Polynomial surface Polynomial surface modeling

Polynomial surface Polynomial surface modeling

GMM Gaussian mixture model (GMM) Is an effective tool for data modeling and pattern classification Speaker acoustic characteristics for clustering, and then each group of acoustic characteristics described with a Gaussian density distribution 聲學模型 高斯混合模型 是一種有效的數據建模工具和模式分類 做法是把語者的聲學特性作分群,然後每一群聲學特性 用一個高斯密度分佈描述 換句話說 可以讓多個訊號平滑話在一起

Databases Databases Evaluated on two distinct datasets American English continuous speech as provided in the TIMIT database Marathi words database specially created for the purpose 在兩個不同的數據集進行評估 提供的在TIMIT數據庫所提供的美國英語連續語音 馬拉地語字庫專門設立的數據庫

Results 結果

Conclusion A comparison of performance with published results on the same task revealed that the spectro-temporal feature systems tested in this work improve upon the best previous systems’ performances in terms of classification accuracies on the specified datasets. 結論 這次的工作中,在指定數據庫比較性能與公佈的相同任務的結果說明,在spectro-temporal feature systems 比以前的systems 的準確度是有提高的。

The End