Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Multipitch Tracking for Noisy Speech
Forecasting using Non Linear Techniques in Time Series Analysis – Michel Camilleri – September FORECASTING USING NON-LINEAR TECHNIQUES IN TIME SERIES.
Abstract Statistical or machine-learning techniques, such as Hidden Markov models and Gaussian mixture models, have dominated the signal processing and.
Motivation Traditional approach to speech and speaker recognition:
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Speech Recognition. What makes speech recognition hard?
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speaker Adaptation for Vowel Classification
Speech Recognition in Noise
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Why is ASR Hard? Natural speech is continuous
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Nonlinear Mixture Autoregressive Hidden Markov Models for Speech Recognition S. Srinivasan, T. Ma, D. May, G. Lazarou and J. Picone Department of Electrical.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Reconstructed Phase Space (RPS)
Similarity measuress Laboratory of Image Analysis for Computer Vision and Multimedia Università di Modena e Reggio Emilia,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Correntropy as a similarity measure Weifeng Liu, P. P. Pokharel, Jose Principe Computational NeuroEngineering Laboratory University of Florida
NONLINEAR DYNAMIC INVARIANTS FOR CONTINUOUS SPEECH RECOGNITION Author: Daniel May Mississippi State University Contact Information: 1255 Louisville St.
Jacob Zurasky ECE5526 – Spring 2011
Digital Image Processing CCS331 Relationships of Pixel 1.
Speech recognition and the EM algorithm
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Nonlinear Dynamical Invariants for Speech Recognition S. Prasad, S. Srinivasan, M. Pannuri, G. Lazarou and J. Picone Department of Electrical and Computer.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Low-Dimensional Chaotic Signal Characterization Using Approximate Entropy Soundararajan Ezekiel Matthew Lang Computer Science Department Indiana University.
EEG analysis during hypnagogium Petr Svoboda Laboratory of System Reliability Faculty of Transportation Czech Technical University
Introduction: Brain Dynamics Jaeseung Jeong, Ph.D Department of Bio and Brain Engineering, KAIST.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Page 0 of 8 Lyapunov Exponents – Theory and Implementation Sanjay Patil Intelligent Electronics Systems Human and Systems Engineering Center for Advanced.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
S. Srinivasan, S. Prasad, S. Patil, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
1 Challenge the future Chaotic Invariants for Human Action Recognition Ali, Basharat, & Shah, ICCV 2007.
Applying Methods of Nonlinear Dynamics for Financial Time Series Analysis Yuri Khakhanov Finance Academy under the Government of the.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Center for Advanced Vehicular Systems An overview of work done so far in.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speech Processing Laboratory, Temple University May 5, Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
Page 0 of 5 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Correlation Dimension.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chaos Analysis.
National Mathematics Day
Online Multiscale Dynamic Topic Models
Dimension Review Many of the geometric structures generated by chaotic map or differential dynamic systems are extremely complex. Fractal : hard to define.
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Capacity Dimension of The Hénon Attractor
Sustained Phones (/aa/, /ae/, /eh/, /sh/, /z/, /f/, /m/, /n/)
Presentation transcript:

Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Estimating Kolmogorov Entropy from Acoustic Attractors from a Recognition Perspective

Page 1 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Estimating the correlation integral from a time series Correlation Integral of an attractor’s trajectory : Correlation sum of a system’s attractor is a measure quantifying the average number of neighbors in a neighborhood of radius along the trajectory. where represents the i’th point on the trajectory, is a valid norm and is the Heaviside’s unit step function (serving as a count function here) At a given embedding dimension (m > [2*D+1]), we have: ~ Fractal Dimension of the attractor

Page 2 of 14 Dynamical Invariants of an Attractor and potential applications for speech data order-q Renyi entropy and K2-Entropy Divide the state space into disjoint boxes If the evolution of the state space that generated the observable is sampled at Numerically, the Kolmogorov entropy can be estimated as the second order Renyi entropy (K 2 ) Represents the joint probability that lies in box i 1 lies in box i 2 and so on.

Page 3 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Second Order Kolmogorov Entropy Estimation of speech data Speech data, sampled at 22.5 KHz – Sustained Phones (/aa/, /ae/, /eh/, /sh/, /z/, /f/, /m/, /n/) Output – Second order Kolmogorov Entropy We wish to analyze: – The presence or absence of chaos in any time series. – Their discrimination characteristics across attractors from different sound units (for classification)

Page 4 of 14 Dynamical Invariants of an Attractor and potential applications for speech data The analysis setup Currently, this analysis includes estimates of K2 for different embedding dimensions Variation in entropy estimates with the neighborhood radius, epsilon was studied Variation in entropy estimates with SNR of the signal was studied Currently, the analysis was performed on 3 vowels, 2 nasals and 2 fricatives Results show that vowels and nasals have a much smaller entropy, as compared to fricatives K 2 consistently decreases with embedding dimension for vowels and nasals, while for fricatives, it consistently increases

Page 5 of 14 Dynamical Invariants of an Attractor and potential applications for speech data The analysis setup (in progress / coming soon)… Data size (length of the time series): –This is crucial for our purpose, since we wish to extract information from short time series (sample data from utterances). Speaker variation: – We wish to study variations in the Kolmogorov entropy of phone or word level attractors across different speakers. across different phones/words across different broad phone classes

Page 6 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Correlation Entropy vs. Embedding Dimension Various Epsilons

Page 7 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Correlation Entropy vs. Embedding Dimension Various Epsilons

Page 8 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Correlation Entropy vs. Embedding Dimension Various Epsilons

Page 9 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Correlation Entropy vs. Embedding Dimension Various SNRs

Page 10 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Correlation Entropy vs. Embedding Dimension Various Data Lengths

Page 11 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Measuring Discrimination Information in K 2 based features Kullback-Leibler (KL) divergence: Provides an information theoretic distance measure between two statistical models Average Discriminating Information between class i and class j: Likelihood: i vs. jLikelihood: j vs. i For Normal Densities:

Page 12 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Measuring Discrimination Information in K 2 based features Statistics of entropy estimates over several frames, for various phones

Page 13 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Measuring Discrimination Information in K 2 based features KL-Divergence Measure between K 2 features from various phonemes for two speakers

Page 14 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Plans Finish studying the use of K 2 entropy as a feature characterizing phone-level attractors – We will be performing a similar analysis on Lyapunov Exponents and Correlation Dimension estimates Measure speaker dependence in this invariant Use this setup on a meaningful recognition task Noise robustness, parameter tweaking, integrating these features to MFCCs Statistical Modeling…