Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.

Similar presentations


Presentation on theme: "Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu."— Presentation transcript:

1 Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula

2 Speech Processing Laboratory, Temple University May 5, 2004 2 Acknowledgment  Dr. Robert Yantorno  Dr. Saroj Biswas  Dr. Henry Sendaula  Speech Lab Members  Air Force Research Laboratory, Rome, NY

3 Speech Processing Laboratory, Temple University May 5, 2004 3 Overview  Voiced and Unvoiced Speech  Usable and Unusable Speech  Nonlinearities in Speech  Non-Linear Embedding  Research Goal  Proposed Research

4 Speech Processing Laboratory, Temple University May 5, 2004 4 Voiced and Unvoiced Speech

5 Speech Processing Laboratory, Temple University May 5, 2004 5 Voiced/Unvoiced Characteristics  Voiced  Quasi-periodic excitation  Modulation by vocal tract  Production of vowels, voiced fricatives & plosives  Unvoiced  No periodic vibration of vocal chords  Noise-like nature  Production of unvoiced fricatives and plosives

6 Speech Processing Laboratory, Temple University May 5, 2004 6 Usable Speech  Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition.  Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments  Target-to-interferer Ratio (TIR) > 20dB

7 Speech Processing Laboratory, Temple University May 5, 2004 7 Nonlinearities in Speech  Glottal waveform changes  Shape varies with amplitude  Physical observations  Flow in vocal tract is non-laminar  Coupling between vocal tract and folds  When glottis is open, prominent changes are observed in formant characteristics

8 Speech Processing Laboratory, Temple University May 5, 2004 8 Nonlinear Embedding  Nonlinear Systems  Point moving along some trajectory in an abstract state space  Coordinates of the point are independent degrees of freedom of the system  State space could be reconstructed from a scalar signal

9 Speech Processing Laboratory, Temple University May 5, 2004 9 Nonlinear Embedding (cont’d)  Takens’ Method of Delays  A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension  Vectors in m-dimensional state space are formed from time-delayed values of a signal

10 Speech Processing Laboratory, Temple University May 5, 2004 10 Nonlinear Embedding (cont’d) m = embedding dimension d = delay value

11 Speech Processing Laboratory, Temple University May 5, 2004 11 Nonlinear Embedding (Cont’d)  Delay value, d:  Dependent on sampling rate and signal properties  Large enough such that nonlinearities are taken into account by the reconstructed trajectory  Small enough to retain reasonable time resolution

12 Speech Processing Laboratory, Temple University May 5, 2004 12 Nonlinear Embedding (Cont’d)  Dimension, m:  Generation of voiced speech constitutes a low- dimensional system  Generation of unvoiced speech constitutes a relatively high-dimensional system  Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech

13 Speech Processing Laboratory, Temple University May 5, 2004 13

14 Speech Processing Laboratory, Temple University May 5, 2004 14 Embedded Voiced and Unvoiced Speech

15 Speech Processing Laboratory, Temple University May 5, 2004 15 Embedded Usable and Unusable Speech

16 Speech Processing Laboratory, Temple University May 5, 2004 16 Research Goal  Feature Extraction  Difference-Mean Comparison (DMC) Measure –Voiced/unvoiced classification  Nodal Density Measure –Voiced/unvoiced classification – Usable/unusable classification

17 Speech Processing Laboratory, Temple University May 5, 2004 Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification

18 Speech Processing Laboratory, Temple University May 5, 2004 18 Introduction  3 rd order difference computation along first non-singleton dimension  Ist order difference of NxN matrix given by  Length(3 rd order diff. > mean) observed

19 Speech Processing Laboratory, Temple University May 5, 2004 19 Embedded Voiced and Unvoiced Speech

20 Speech Processing Laboratory, Temple University May 5, 2004 20 Difference-Mean Comparison Distribution

21 Speech Processing Laboratory, Temple University May 5, 2004 21 Difference-Mean Comparison Distribution

22 Speech Processing Laboratory, Temple University May 5, 2004 22 Difference-Mean Comparison Distribution

23 Speech Processing Laboratory, Temple University May 5, 2004 23 DMC-Based Decisions

24 Speech Processing Laboratory, Temple University May 5, 2004 24 DMC-Based Decisions

25 Speech Processing Laboratory, Temple University May 5, 2004 25 DMC-Based Decisions

26 Speech Processing Laboratory, Temple University May 5, 2004 26 DMC-Based Decisions

27 Speech Processing Laboratory, Temple University May 5, 2004 27 DMC-Based Decisions

28 Speech Processing Laboratory, Temple University May 5, 2004 28 DMC-Based Decisions

29 Speech Processing Laboratory, Temple University May 5, 2004 29 Results

30 Speech Processing Laboratory, Temple University May 5, 2004 30 Results (Cont’d)

31 Speech Processing Laboratory, Temple University May 5, 2004 Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification

32 Speech Processing Laboratory, Temple University May 5, 2004 32 Introduction  Smallest cube which encloses the signal is determined  This cube is divided into N smaller cubes  Edges of the smaller cubes are defined as nodes  Number of nodes spanned by the signal is determined  Ratio of number of nodes spanned to total number of nodes is defined as nodal density

33 Speech Processing Laboratory, Temple University May 5, 2004 Voiced/Unvoiced Classification

34 Speech Processing Laboratory, Temple University May 5, 2004 34 Embedded Voiced and Unvoiced Speech Frames with Grids

35 Speech Processing Laboratory, Temple University May 5, 2004 35 Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames

36 Speech Processing Laboratory, Temple University May 5, 2004 36 Nodal-Density Distribution

37 Speech Processing Laboratory, Temple University May 5, 2004 37 Nodal-Density Distribution

38 Speech Processing Laboratory, Temple University May 5, 2004 38 Nodal-Density Distribution

39 Speech Processing Laboratory, Temple University May 5, 2004 39 Filtering  Moving Average Filter  Order, M = 10

40 Speech Processing Laboratory, Temple University May 5, 2004 40 Nodal-Density Distributions after Filtering

41 Speech Processing Laboratory, Temple University May 5, 2004 41 Nodal-Density Distributions after Filtering

42 Speech Processing Laboratory, Temple University May 5, 2004 42 Nodal-Density Distributions After Filtering

43 Speech Processing Laboratory, Temple University May 5, 2004 43 Results

44 Speech Processing Laboratory, Temple University May 5, 2004 44 Results (Cont’d)

45 Speech Processing Laboratory, Temple University May 5, 2004 Proposed Research Usable/Unusable Classification

46 Speech Processing Laboratory, Temple University May 5, 2004 46 Embedded Usable and Unusable Speech Frames with Grids

47 Speech Processing Laboratory, Temple University May 5, 2004 47 Nodes Spanned by Embedded Usable and Unusable Speech Frames -4000 -2000 0 2000 4000 6000 -5000 0 5000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR -10000 -5000 0 5000 -10000 -5000 0 5000 -6000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR

48 Speech Processing Laboratory, Temple University May 5, 2004 48 Preliminary Results

49 Speech Processing Laboratory, Temple University May 5, 2004 49 Summary Speech Nonlinear Embedding Difference- Mean Comparison Nodal Density Usable/Unusable Classification V/UV Classification

50 Speech Processing Laboratory, Temple University May 5, 2004 50 Future Proposed Research  Determine optimum filter for nodal density-based voiced/unvoiced classification  Develop nodal density measure for usable/unusable classification  Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification  Perform decision-level fusion of both features

51 Speech Processing Laboratory, Temple University May 5, 2004 51 If you understood this presentation … please ask QUESTIONS !!!


Download ppt "Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu."

Similar presentations


Ads by Google