Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.

Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula

Speech Processing Laboratory, Temple University May 5, 2004 2 Acknowledgment  Dr. Robert Yantorno  Dr. Saroj Biswas  Dr. Henry Sendaula  Speech Lab Members  Air Force Research Laboratory, Rome, NY

Speech Processing Laboratory, Temple University May 5, 2004 3 Overview  Voiced and Unvoiced Speech  Usable and Unusable Speech  Nonlinearities in Speech  Non-Linear Embedding  Research Goal  Proposed Research

Speech Processing Laboratory, Temple University May 5, 2004 4 Voiced and Unvoiced Speech

Speech Processing Laboratory, Temple University May 5, 2004 5 Voiced/Unvoiced Characteristics  Voiced  Quasi-periodic excitation  Modulation by vocal tract  Production of vowels, voiced fricatives & plosives  Unvoiced  No periodic vibration of vocal chords  Noise-like nature  Production of unvoiced fricatives and plosives

Speech Processing Laboratory, Temple University May 5, 2004 6 Usable Speech  Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition.  Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments  Target-to-interferer Ratio (TIR) > 20dB

Speech Processing Laboratory, Temple University May 5, 2004 7 Nonlinearities in Speech  Glottal waveform changes  Shape varies with amplitude  Physical observations  Flow in vocal tract is non-laminar  Coupling between vocal tract and folds  When glottis is open, prominent changes are observed in formant characteristics

Speech Processing Laboratory, Temple University May 5, 2004 8 Nonlinear Embedding  Nonlinear Systems  Point moving along some trajectory in an abstract state space  Coordinates of the point are independent degrees of freedom of the system  State space could be reconstructed from a scalar signal

Speech Processing Laboratory, Temple University May 5, 2004 9 Nonlinear Embedding (cont’d)  Takens’ Method of Delays  A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension  Vectors in m-dimensional state space are formed from time-delayed values of a signal

Speech Processing Laboratory, Temple University May 5, 2004 10 Nonlinear Embedding (cont’d) m = embedding dimension d = delay value

Speech Processing Laboratory, Temple University May 5, 2004 11 Nonlinear Embedding (Cont’d)  Delay value, d:  Dependent on sampling rate and signal properties  Large enough such that nonlinearities are taken into account by the reconstructed trajectory  Small enough to retain reasonable time resolution

Speech Processing Laboratory, Temple University May 5, 2004 12 Nonlinear Embedding (Cont’d)  Dimension, m:  Generation of voiced speech constitutes a low- dimensional system  Generation of unvoiced speech constitutes a relatively high-dimensional system  Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech

Speech Processing Laboratory, Temple University May 5, 2004 13

Speech Processing Laboratory, Temple University May 5, 2004 14 Embedded Voiced and Unvoiced Speech

Speech Processing Laboratory, Temple University May 5, 2004 15 Embedded Usable and Unusable Speech

Speech Processing Laboratory, Temple University May 5, 2004 16 Research Goal  Feature Extraction  Difference-Mean Comparison (DMC) Measure –Voiced/unvoiced classification  Nodal Density Measure –Voiced/unvoiced classification – Usable/unusable classification

Speech Processing Laboratory, Temple University May 5, 2004 Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification

Speech Processing Laboratory, Temple University May 5, 2004 18 Introduction  3 rd order difference computation along first non-singleton dimension  Ist order difference of NxN matrix given by  Length(3 rd order diff. > mean) observed

Speech Processing Laboratory, Temple University May 5, 2004 19 Embedded Voiced and Unvoiced Speech

Speech Processing Laboratory, Temple University May 5, 2004 20 Difference-Mean Comparison Distribution

Speech Processing Laboratory, Temple University May 5, 2004 23 DMC-Based Decisions

Speech Processing Laboratory, Temple University May 5, 2004 29 Results

Speech Processing Laboratory, Temple University May 5, 2004 30 Results (Cont’d)

Speech Processing Laboratory, Temple University May 5, 2004 Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification

Speech Processing Laboratory, Temple University May 5, 2004 32 Introduction  Smallest cube which encloses the signal is determined  This cube is divided into N smaller cubes  Edges of the smaller cubes are defined as nodes  Number of nodes spanned by the signal is determined  Ratio of number of nodes spanned to total number of nodes is defined as nodal density

Speech Processing Laboratory, Temple University May 5, 2004 Voiced/Unvoiced Classification

Speech Processing Laboratory, Temple University May 5, 2004 34 Embedded Voiced and Unvoiced Speech Frames with Grids

Speech Processing Laboratory, Temple University May 5, 2004 35 Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames

Speech Processing Laboratory, Temple University May 5, 2004 36 Nodal-Density Distribution

Speech Processing Laboratory, Temple University May 5, 2004 39 Filtering  Moving Average Filter  Order, M = 10

Speech Processing Laboratory, Temple University May 5, 2004 40 Nodal-Density Distributions after Filtering

Speech Processing Laboratory, Temple University May 5, 2004 41 Nodal-Density Distributions after Filtering

Speech Processing Laboratory, Temple University May 5, 2004 42 Nodal-Density Distributions After Filtering

Speech Processing Laboratory, Temple University May 5, 2004 43 Results

Speech Processing Laboratory, Temple University May 5, 2004 44 Results (Cont’d)

Speech Processing Laboratory, Temple University May 5, 2004 Proposed Research Usable/Unusable Classification

Speech Processing Laboratory, Temple University May 5, 2004 46 Embedded Usable and Unusable Speech Frames with Grids

Speech Processing Laboratory, Temple University May 5, 2004 47 Nodes Spanned by Embedded Usable and Unusable Speech Frames -4000 -2000 0 2000 4000 6000 -5000 0 5000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR -10000 -5000 0 5000 -10000 -5000 0 5000 -6000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR

Speech Processing Laboratory, Temple University May 5, 2004 48 Preliminary Results

Speech Processing Laboratory, Temple University May 5, 2004 49 Summary Speech Nonlinear Embedding Difference- Mean Comparison Nodal Density Usable/Unusable Classification V/UV Classification

Speech Processing Laboratory, Temple University May 5, 2004 50 Future Proposed Research  Determine optimum filter for nodal density-based voiced/unvoiced classification  Develop nodal density measure for usable/unusable classification  Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification  Perform decision-level fusion of both features

Speech Processing Laboratory, Temple University May 5, 2004 51 If you understood this presentation … please ask QUESTIONS !!!

Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.

Similar presentations

Presentation on theme: "Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.

Similar presentations

Presentation on theme: "Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu."— Presentation transcript:

Similar presentations

About project

Feedback