Download presentation
Presentation is loading. Please wait.
Published byRebecca Greer Modified over 9 years ago
1
Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula
2
Speech Processing Laboratory, Temple University May 5, 2004 2 Acknowledgment Dr. Robert Yantorno Dr. Saroj Biswas Dr. Henry Sendaula Speech Lab Members Air Force Research Laboratory, Rome, NY
3
Speech Processing Laboratory, Temple University May 5, 2004 3 Overview Voiced and Unvoiced Speech Usable and Unusable Speech Nonlinearities in Speech Non-Linear Embedding Research Goal Proposed Research
4
Speech Processing Laboratory, Temple University May 5, 2004 4 Voiced and Unvoiced Speech
5
Speech Processing Laboratory, Temple University May 5, 2004 5 Voiced/Unvoiced Characteristics Voiced Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Unvoiced No periodic vibration of vocal chords Noise-like nature Production of unvoiced fricatives and plosives
6
Speech Processing Laboratory, Temple University May 5, 2004 6 Usable Speech Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition. Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments Target-to-interferer Ratio (TIR) > 20dB
7
Speech Processing Laboratory, Temple University May 5, 2004 7 Nonlinearities in Speech Glottal waveform changes Shape varies with amplitude Physical observations Flow in vocal tract is non-laminar Coupling between vocal tract and folds When glottis is open, prominent changes are observed in formant characteristics
8
Speech Processing Laboratory, Temple University May 5, 2004 8 Nonlinear Embedding Nonlinear Systems Point moving along some trajectory in an abstract state space Coordinates of the point are independent degrees of freedom of the system State space could be reconstructed from a scalar signal
9
Speech Processing Laboratory, Temple University May 5, 2004 9 Nonlinear Embedding (cont’d) Takens’ Method of Delays A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension Vectors in m-dimensional state space are formed from time-delayed values of a signal
10
Speech Processing Laboratory, Temple University May 5, 2004 10 Nonlinear Embedding (cont’d) m = embedding dimension d = delay value
11
Speech Processing Laboratory, Temple University May 5, 2004 11 Nonlinear Embedding (Cont’d) Delay value, d: Dependent on sampling rate and signal properties Large enough such that nonlinearities are taken into account by the reconstructed trajectory Small enough to retain reasonable time resolution
12
Speech Processing Laboratory, Temple University May 5, 2004 12 Nonlinear Embedding (Cont’d) Dimension, m: Generation of voiced speech constitutes a low- dimensional system Generation of unvoiced speech constitutes a relatively high-dimensional system Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech
13
Speech Processing Laboratory, Temple University May 5, 2004 13
14
Speech Processing Laboratory, Temple University May 5, 2004 14 Embedded Voiced and Unvoiced Speech
15
Speech Processing Laboratory, Temple University May 5, 2004 15 Embedded Usable and Unusable Speech
16
Speech Processing Laboratory, Temple University May 5, 2004 16 Research Goal Feature Extraction Difference-Mean Comparison (DMC) Measure –Voiced/unvoiced classification Nodal Density Measure –Voiced/unvoiced classification – Usable/unusable classification
17
Speech Processing Laboratory, Temple University May 5, 2004 Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification
18
Speech Processing Laboratory, Temple University May 5, 2004 18 Introduction 3 rd order difference computation along first non-singleton dimension Ist order difference of NxN matrix given by Length(3 rd order diff. > mean) observed
19
Speech Processing Laboratory, Temple University May 5, 2004 19 Embedded Voiced and Unvoiced Speech
20
Speech Processing Laboratory, Temple University May 5, 2004 20 Difference-Mean Comparison Distribution
21
Speech Processing Laboratory, Temple University May 5, 2004 21 Difference-Mean Comparison Distribution
22
Speech Processing Laboratory, Temple University May 5, 2004 22 Difference-Mean Comparison Distribution
23
Speech Processing Laboratory, Temple University May 5, 2004 23 DMC-Based Decisions
24
Speech Processing Laboratory, Temple University May 5, 2004 24 DMC-Based Decisions
25
Speech Processing Laboratory, Temple University May 5, 2004 25 DMC-Based Decisions
26
Speech Processing Laboratory, Temple University May 5, 2004 26 DMC-Based Decisions
27
Speech Processing Laboratory, Temple University May 5, 2004 27 DMC-Based Decisions
28
Speech Processing Laboratory, Temple University May 5, 2004 28 DMC-Based Decisions
29
Speech Processing Laboratory, Temple University May 5, 2004 29 Results
30
Speech Processing Laboratory, Temple University May 5, 2004 30 Results (Cont’d)
31
Speech Processing Laboratory, Temple University May 5, 2004 Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification
32
Speech Processing Laboratory, Temple University May 5, 2004 32 Introduction Smallest cube which encloses the signal is determined This cube is divided into N smaller cubes Edges of the smaller cubes are defined as nodes Number of nodes spanned by the signal is determined Ratio of number of nodes spanned to total number of nodes is defined as nodal density
33
Speech Processing Laboratory, Temple University May 5, 2004 Voiced/Unvoiced Classification
34
Speech Processing Laboratory, Temple University May 5, 2004 34 Embedded Voiced and Unvoiced Speech Frames with Grids
35
Speech Processing Laboratory, Temple University May 5, 2004 35 Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames
36
Speech Processing Laboratory, Temple University May 5, 2004 36 Nodal-Density Distribution
37
Speech Processing Laboratory, Temple University May 5, 2004 37 Nodal-Density Distribution
38
Speech Processing Laboratory, Temple University May 5, 2004 38 Nodal-Density Distribution
39
Speech Processing Laboratory, Temple University May 5, 2004 39 Filtering Moving Average Filter Order, M = 10
40
Speech Processing Laboratory, Temple University May 5, 2004 40 Nodal-Density Distributions after Filtering
41
Speech Processing Laboratory, Temple University May 5, 2004 41 Nodal-Density Distributions after Filtering
42
Speech Processing Laboratory, Temple University May 5, 2004 42 Nodal-Density Distributions After Filtering
43
Speech Processing Laboratory, Temple University May 5, 2004 43 Results
44
Speech Processing Laboratory, Temple University May 5, 2004 44 Results (Cont’d)
45
Speech Processing Laboratory, Temple University May 5, 2004 Proposed Research Usable/Unusable Classification
46
Speech Processing Laboratory, Temple University May 5, 2004 46 Embedded Usable and Unusable Speech Frames with Grids
47
Speech Processing Laboratory, Temple University May 5, 2004 47 Nodes Spanned by Embedded Usable and Unusable Speech Frames -4000 -2000 0 2000 4000 6000 -5000 0 5000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR -10000 -5000 0 5000 -10000 -5000 0 5000 -6000 -4000 -2000 0 2000 4000 6000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR
48
Speech Processing Laboratory, Temple University May 5, 2004 48 Preliminary Results
49
Speech Processing Laboratory, Temple University May 5, 2004 49 Summary Speech Nonlinear Embedding Difference- Mean Comparison Nodal Density Usable/Unusable Classification V/UV Classification
50
Speech Processing Laboratory, Temple University May 5, 2004 50 Future Proposed Research Determine optimum filter for nodal density-based voiced/unvoiced classification Develop nodal density measure for usable/unusable classification Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification Perform decision-level fusion of both features
51
Speech Processing Laboratory, Temple University May 5, 2004 51 If you understood this presentation … please ask QUESTIONS !!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.