Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Slides:

Advertisements

Similar presentations

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.

Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

CMSC Assignment 1 Audio signal processing

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Speaker Adaptation for Vowel Classification

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

A PRESENTATION BY SHAMALEE DESHPANDE

Representing Acoustic Information

ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Pattern Recognition Feature Generation Linear Prediction Gaussian Mixture Models.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Eng. Shady Yehia El-Mashad

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.

Jacob Zurasky ECE5526 – Spring 2011

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.

Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.

PATTERN COMPARISON TECHNIQUES

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

ARTIFICIAL NEURAL NETWORKS

Speech Processing AEGIS RET All-Hands Meeting

Spoken Digit Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Automated Detection of Speech Landmarks Using

Cepstrum and MFCC Cepstrum MFCC Speech processing.

Leigh Anne Clevenger Pace University, DPS ’16

3. Applications to Speaker Verification

Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu

Presentation for EEL6586 Automatic Speech Processing

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Digital Systems: Hardware Organization and Design

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Presented by Chen-Wei Liu

Keyword Spotting Dynamic Time Warping

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India Received 8 December 2011; received in revised form 12 March 2012; accepted 23 April 2012 Available online 1 June 2012 在頻率時間分類送氣塞音清晰度 Chairman:Hung-Chi Yang Presenter: Yue-Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20

Outline Introduction MFCC 2D-DCT Polynomial surface

Outline GMM Results Conclusion

Introduction Automatic speech recognition (ASR) system The goal is the lexical content of the human voice is converted to a computer-readable input Attempt to identify or confirm issue voice speaker rather than the content of the terms contained therein 自動語音辨認其目標是將人類的語音中的詞彙內容轉換為計算機可讀的輸入後者嘗試識別或確認發出語音的說話人而非其中所包含的詞彙內容。簡單的說就是讓機器聽的懂人說的話

Introduction Automatic speech recognition (ASR) system Acoustics feature Signal processing and feature extraction Mel frequency cepstral coefficients (MFCC) Acoustics model Statistically speech model Gaussian mixture model (GMM) 系統構成聲學特徵訊號處理特徵的提取聲學特徵的提取與選擇是語音識別的一個重要環節。 Mel倒譜係數此參數考慮到人耳對不同頻率的感受程度，因此特別適合用在語音辨識。聲學模型以統計語音模型的方式建構高斯混合模型可以讓多個訊號平滑話在一起

MFCC Mel frequency cepstral coefficients (MFCC) MFCC takes human perception sensitivity with respect to frequencies into consideration, and therefore are best for speech/speaker recognition. 此參數考慮到人耳對不同頻率的感受程度，因此特別適合用在語音辨識。所以是現在常用的對人的聽覺機理的研究發現，當兩個頻率相近的音調同時發出時，人只能聽到一個音調。臨界頻寬指的就是這樣一種令人的主觀感覺發生突變的頻寬邊界，當兩個音調的頻率差小於臨界頻寬時，人就會把兩個音調聽成一個，這稱之為屏蔽效應。Mel刻度是對這一臨界頻寬的度量方法之一。

MFCC Pre-emphasis Frame blocking Hamming windowing The speech signal s(n) is sent to a high-pass filter Frame blocking Hamming windowing Each frame has to be multiplied with a hamming window in order to keep the continuity of the first and the last points in the frame 預強調將語音訊號 s(n) 通過一個高通濾波器音框化漢明窗將每一個音框乘上漢明窗，以增加音框左端和右端的連續性

MFCC Fast Fourier Transform or FFT Triangular Bandpass Filters The time domain signal into a frequency domain Triangular Bandpass Filters Smooth the magnitude spectrum such that the harmonics are flattened in order to obtain the envelop of the spectrum with harmonics. Discrete cosine transform or DCT 快速傅利葉轉換時域信號轉化成頻域三角帶通濾波器對頻譜進行平滑化，並消除諧波的作用，突顯原先語音的共振峰。離散餘弦轉換問學長 MFCC的計算首先用FFT將時域信號轉化成頻域，之後對其對數能量譜用依照Mel刻度分布的三角濾波器組進行卷積，最後對各個濾波器的輸出構成的向量進行離散餘弦變換DCT，取前N個係數。PLP仍用德賓法去計算LPC參數，但在計算自相關參數時用的也是對聽覺激勵的對數能量譜進行DCT的方法。

MFCC Log energy Delta cepstrum The energy within a frame is also an important feature that can be easily obtained Delta cepstrum Actually used in speech recognition, we usually coupled differential cepstrum parameters to show the changes of the the cepstrum parameters of the time 對數能量一個音框的音量（即能量），也是語音的重要特徵，而且非常容易計算差量倒頻譜參數而在實際應用於語音辨識時，我們通常會再加上差量倒頻譜參數，以顯示倒頻譜參數對時間的變化。 MFCC的計算首先用FFT將時域信號轉化成頻域，之後對其對數能量譜用依照Mel刻度分布的三角濾波器組進行卷積，最後對各個濾波器的輸出構成的向量進行離散餘弦變換DCT，取前N個係數。PLP仍用德賓法去計算LPC參數，但在計算自相關參數時用的也是對聽覺激勵的對數能量譜進行DCT的方法。

2D-DCT 2D-DCT modeling 二維的離散餘弦轉換說明跟一維的差別（在問學長）

Polynomial surface Polynomial surface modeling 多項式曲面模型把它弄亂

Polynomial surface Polynomial surface modeling

Polynomial surface Polynomial surface modeling

Polynomial surface Polynomial surface modeling

GMM Gaussian mixture model (GMM) Is an effective tool for data modeling and pattern classification Speaker acoustic characteristics for clustering, and then each group of acoustic characteristics described with a Gaussian density distribution 聲學模型高斯混合模型是一種有效的數據建模工具和模式分類做法是把語者的聲學特性作分群,然後每一群聲學特性用一個高斯密度分佈描述換句話說可以讓多個訊號平滑話在一起

Databases Databases Evaluated on two distinct datasets American English continuous speech as provided in the TIMIT database Marathi words database specially created for the purpose 在兩個不同的數據集進行評估提供的在TIMIT數據庫所提供的美國英語連續語音馬拉地語字庫專門設立的數據庫

Results 結果

Conclusion A comparison of performance with published results on the same task revealed that the spectro-temporal feature systems tested in this work improve upon the best previous systems’ performances in terms of classification accuracies on the specified datasets. 結論這次的工作中，在指定數據庫比較性能與公佈的相同任務的結果說明，在spectro-temporal feature systems 比以前的systems 的準確度是有提高的。

The End