Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Slides:

Advertisements

Similar presentations

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Advertisements

An Overview of Machine Learning

Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.

Speaker Adaptation for Vowel Classification

Multiple Pitch Tracking for Blind Source Separation Using a Single Microphone Joseph Tabrikian Dept. of Electrical and Computer Engineering Ben-Gurion.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.

Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.

Optimal Adaptation for Statistical Classifiers Xiao Li.

HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.

Isolated-Word Speech Recognition Using Hidden Markov Models

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Basics of Neural Networks Neural Network Topologies.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.

Performance Comparison of Speaker and Emotion Recognition

ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Noise and Data Errors Nominal Observation for “1” Nominal Observation for “0” Probability density for “0” with Noise Probability density for “1” with Noise.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

AN EXPECTATION MAXIMIZATION APPROACH FOR FORMANT TRACKING USING A PARAMETER-FREE NON-LINEAR PREDICTOR Issam Bazzi, Alex Acero, and Li Deng Microsoft Research.

Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

High Quality Voice Morphing

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

Statistical Models for Automatic Speech Recognition

Computational NeuroEngineering Lab

Statistical Models for Automatic Speech Recognition

8-Speech Recognition Speech Recognition Concepts

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

EE513 Audio Signals and Systems

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Machine Learning – a Probabilistic Perspective

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented By Yi-Ting

Outline Introduction Speech Reconstruction Sinusoidal model Pitch Prediction GMM-based prediction HMM-based prediction Voiced ／ unvoiced classification Experimental Results Conclusion

Introduction Speech to be reconstructed from MFCC vectors through the inclusion of pitch information. The aim of this word is to predict the pitch frequency from the MFCC vector. Several studies have indicated that class- dependent correlation exists between the spectral envelop and pitch.

Speech Reconstruction Speech reconstruction from MFCC vectors and pitch using the sinusoidal model The sinusoidal model synthesis a speech signal,x(n). An estimate of the spectral envelope can be calculated from an MFCC vector by zero padding and applying and inverse discrete cosine transform (IDCT).

Speech Reconstruction A smoothed magnitude spectral estimate. Normalization must be applied to remove the effect of pre-emphasis and the non-linear filterbank channel. The frequency of the sinusoidal components,,can be estimated from the pitch frequency,,can be computed from the smoothed magnitude spectral estimate.

Pitch Prediction These scheme are based on modeling the joint density of the MFCC vector, x, and pitch frequency, f. Form a set of training data, a series of augmented feature vector, y, are extracted.

Pitch Prediction GMM-based prediction From the training set of augmented vectors, unsupervised clustering is implemented using EM algorithm to produce a set of K clusters. Each of these cluster is represented by Gaussian probability density function

Pitch Prediction GMM-based prediction Using these cluster-based correlations a prediction of the pitch frequency,, can be made from an input MFCC vector. The closest cluster, k. To prediction of pitch ：

Pitch Prediction GMM-based prediction An alternative method combines the MAP pitch prediction from all K clusters in the GMM.

Pitch Prediction HMM-based prediction To better model the inherent correlation of the feature vector stream, a series of HMMs,

Pitch Prediction HMM-based prediction The first stage of training involves the creation of a set of HMM-based speech models. The training data is aligned to the speech models using Viterbi decoding and vectors belonging to each state, s, of each model, w. (Unvoiced vectors are removed to ensure) Clustering is applied to the pooled vectors within each voiced state using the EM algorithm.

Pitch Prediction HMM-based prediction Prediction of the pitch ： (By first determining the model and state sequence from the set of models using Viterbi decoding.

Pitch Prediction Voiced ／ unvoiced classification through analysis of the resulting model, to classify MFCC stream into voiced or unvoiced speech. Voiced was calculated,

Pitch Prediction Voiced ／ unvoiced classification Using the state occupancy,, measured from the training data, the voicing is determined. The threshold,, has been determined experimentally with a reasonable value being =0.2.

Experimental Results Measure both the accuracy of pitch prediction and the resultant reconstructed speech quality. ETSI aurora database, 200 utterances for training and 90 for testing.

Experimental Results Pitch classification error is measured as, RMS pitch error is computed as,

Experimental Results

Increasing the number of clusters in each state of the HMM enables more detailed modeling of the joint distribution of MFCC and pitch. The significant majority of frame classification errors arise form arise from incorrect voiced/unvoiced decisions which in low energy regions at the start and end of speech.

Experimental Results

Conclusion Speech reconstructed from the predicted pitch, using a sinusoidal model, is almost indistinguishable from that reconstructed using the reference pitch.