Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structure-Based Speech Classification Using State-Space Embedding

Similar presentations


Presentation on theme: "Structure-Based Speech Classification Using State-Space Embedding"— Presentation transcript:

1 Structure-Based Speech Classification Using State-Space Embedding
Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula My Thesis research is related to the speaker identification system enhancement techniques using the SID-usable speech concept. Speech has been modeled in this research using the sinusoidal model. The Idea of SID-usable speech was introduced by a previous master’s student ananth iyer; where he proposed a system wherein the SID system itself is used as ground truth to identify what is usable and unusable to it. Monday, November 29, 2004

2 Acknowledgment Dr. Robert Yantorno Dr. Saroj Biswas Dr. Henry Sendaula
Speech Lab Members Air Force Research Laboratory, Rome, NY Monday, November 29, 2004

3 Overview Usable Speech Voiced Speech State-Space Embedding
Research Goals Usable Speech Detection Voiced Speech Detection Conclusion I first give a brief introduction as to what is ‘usable speech’ and how the usable speech detection system is designed Then, introduce my research goals. I talk about the TIR-Usable speech measure that I have developed. Then, the SID-usable speech approach and some of the proposed SID enhancement techniques Monday, November 29, 2004

4 Usable Speech Monday, November 29, 2004

5 TIR-Based Usable Speech
Monday, November 29, 2004

6 TIR-Based Results Monday, November 29, 2004

7 Next-Generation Co-Channel Speech Processing System
Usable Speech Segments Speaker 1 Usable Segments Segment Speech from Speaker 1 Speech Speech Extraction Speaker 2 Extraction Segments Sub-Unit Reconstruction Sub-Unit Sub-Unit Co-Channel Speech from Speaker 2 Speech Monday, November 29, 2004

8 Voiced Speech Monday, November 29, 2004

9 Voiced/Unvoiced Characteristics
Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Unvoiced No periodic vibration of vocal chords Noise-like nature Production of unvoiced fricatives and plosives Monday, November 29, 2004

10 State-Space Embedding
Monday, November 29, 2004

11 Nonlinearities in Speech
Glottal waveform changes Shape varies with amplitude Physical observations Flow in vocal tract is non-laminar Coupling between vocal tract and folds When glottis is open, prominent changes are observed in formant characteristics Monday, November 29, 2004

12 State-Space Embedding
Nonlinear Systems Point moving along some trajectory in an abstract state space Coordinates of the point are independent degrees of freedom of the system State space could be reconstructed from a scalar signal Monday, November 29, 2004

13 State-Space Embedding
Takens’ Method of Delays A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension Vectors in m-dimensional state space are formed from time-delayed values of a signal Monday, November 29, 2004

14 State-Space Embedding (cont’d)
m = embedding dimension d = delay value Talk about Usable speech and the use of TIR as the ground truth for usable speech detection. Then, talk about how the extracted usable speech is used to find out the speaker’s identity from the speaker identification system. Monday, November 29, 2004

15 State-Space Embedding
Delay value, d: Dependent on sampling rate and signal properties Large enough such that nonlinearities are taken into account by the reconstructed trajectory Small enough to retain reasonable time resolution Monday, November 29, 2004

16 State-Space Embedding
Dimension, m: Generation of voiced speech constitutes a low-dimensional system Generation of unvoiced speech constitutes a relatively high-dimensional system Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech Monday, November 29, 2004

17 Recent Applications of State-Space Embedding in Speech Processing
Pitch detection Terez, D. E., “Robust Pitch Determination Using Nonlinear State-Space Embedding”, ICASSP, 2002. Automatic speech segmentation using curvature Smolenski, B. Y., “A Filterless Approach to Processing Speech in Degraded Environments” Dissertation Proposal, 2004. Monday, November 29, 2004

18 Research Goals Structured Usable and Voiced Speech Speech
State-Space Embedding Usable and Voiced Speech Unusable and Unvoiced speech Structure Observation Structured Unstructured Monday, November 29, 2004

19 Usable Speech Detection
Monday, November 29, 2004

20 Usable and Unusable Speech
Monday, November 29, 2004

21 Embedded Voiced and Unvoiced Speech
Observable Difference Usable speech signal is less dense than unusable Measure Nodal Density (ND) Measure Monday, November 29, 2004

22 Nodal Density (ND) Measure
Monday, November 29, 2004

23 Nodal Density Smallest cube which encloses the signal is determined
This cube is divided into N smaller cubes Edges of the smaller cubes are defined as nodes Number of nodes spanned by the signal is determined Ratio of number of nodes spanned to total number of nodes is defined as nodal density Monday, November 29, 2004

24 Embedded Usable and Unusable Speech Frames with Grids
Monday, November 29, 2004

25 Nodes Spanned by Embedded Usable and Unusable Speech Frames
-4000 -2000 2000 4000 6000 -5000 5000 Nodes Spanned by Embedded Co-channel Speech of 30dB TIR -10000 -6000 Monday, November 29, 2004

26 ND Distribution Monday, November 29, 2004

27 Usable Speech Detection Procedure
Voiced Speech Extractor Target + Interferer Framing Nonlinear Embedding Compute Nodal Density Usable Cubing N= 73= 343 Unusable Monday, November 29, 2004

28 ND-Based Usable speech Detection Results
Monday, November 29, 2004

29 Result Comparison Monday, November 29, 2004

30 Voiced Speech Detection
Monday, November 29, 2004

31 Voiced and Unvoiced Speech
Monday, November 29, 2004

32 Embedded Voiced and Unvoiced Speech (cont’d)
Observable Differences Rate of change of unvoiced signal is faster than that of voiced. Voiced signal is less dense than unvoiced Measures Difference-Mean Comparison (DMC) Measure Nodal Density (ND) Measure Monday, November 29, 2004

33 Difference-Mean Comparison (DMC) Measure
Monday, November 29, 2004

34 Difference-Mean Comparison
3rd order difference computation along first non-singleton dimension 1st order difference of NxN matrix given by Length(3rd order diff. > mean) observed Monday, November 29, 2004

35 DMC Procedure 3rd Order Lowpass Filtering Difference
Speech State Space Embedding Comparison Mean of First Dimension 3rd Order Difference Computation Voiced Unvoiced < Threshold > Threshold Monday, November 29, 2004

36 DMC Results Monday, November 29, 2004

37 Result Comparison Monday, November 29, 2004

38 Nodal Density (ND) Measure
Monday, November 29, 2004

39 Nodal Density Smallest cube which encloses the signal is determined
This cube is divided into N smaller cubes Edges of the smaller cubes are defined as nodes Number of nodes spanned by the signal is determined Ratio of number of nodes spanned to total number of nodes is defined as nodal density Monday, November 29, 2004

40 Embedded Voiced and Unvoiced Speech Frames with Grids
Monday, November 29, 2004

41 Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames
Monday, November 29, 2004

42 Nodal Density Procedure
Lowpass Filtering Speech State Space Embedding Computation of Nodal Density Cubing N = 1000 Estimation of Largest Cube spanned Voiced Unvoiced < Threshold > Threshold Monday, November 29, 2004

43 Nodal Density Results Monday, November 29, 2004

44 Result Comparison Monday, November 29, 2004

45 Comparison of ND and DMC Measures
Monday, November 29, 2004

46 Fusion of Voiced Speech Detection Measures
Monday, November 29, 2004

47 Why fusion? Different features can provide complementary information.
Different classifiers can produce different decisions. The best classifier can produce an error that an inferior classifier correctly identifies. Monday, November 29, 2004

48 Levels of Fusion Data level fusion Feature level fusion
Decision level fusion Monday, November 29, 2004

49 Mutual Information p(c,y) = joint probability mass function of C and Y
p(c) and p(y) = marginal probability mass functions Monday, November 29, 2004

50 Mutual Information E ZC FR RE DMC ND 0.18 0.31 0.05 1.21 0.20 0.22
0.28 0.10 0.09 1.78 Monday, November 29, 2004

51 Result Comparison - DMC
Monday, November 29, 2004

52 Result Comparison - ND Monday, November 29, 2004

53 Summary Usable Speech Detection Voiced Speech Detection
Nonlinear reconstruction of co-channel speech enhances discrimination between usable and unusable speech. Nodal density measure outperforms existing TIR-based usable speech detection measures Voiced Speech Detection Two structure-based measures have been developed, which show an improvement over traditional measures in voiced speech detection under high-noise conditions. Fusion of voiced speech detection measures further increases voiced speech detection accuracy Monday, November 29, 2004

54 Further Research Usable Speech Detection Voiced Speech Detection
Evaluate performance of usable speech detection with noisy co-channel speech. Fuse the ND measure with existing usable speech detection measures such as APPC and SAPVR Voiced Speech Detection Employ more advanced fusion techniques such as independent component analysis Further enhance voiced speech detection under very high noise conditions by performing adaptive filtering of noisy signals. Monday, November 29, 2004

55 Publications [U. Ofoegbu, B. Smolenski and R. Yantorno] “Structure-Based Voiced/Usable Speech Detection Using State-space Embedding”, IEEE international Symposium on in Intelligent Signal Processing and Communication Systems (ISPACS), 2004. [B. Smolenski, U. Ofoegbu and R. Yantorno] “Nonlinear state space embedding Features and their application to Robust Speech segmentation ”, IEEE international Symposium on in Intelligent Signal Processing and Communication Systems (ISPACS), 2004. Monday, November 29, 2004

56 Please feel FREE to ask QUESTIONS !!!
Puzzled? Perplexed?? Baffled??? Mystified???? Please feel FREE to ask QUESTIONS !!! Monday, November 29, 2004

57 EXTRA SLIDES Monday, November 29, 2004

58 Experimental Set-Up nCr = n!/(r!(n-r!)) = 861
41 Speech utterances (TIMIT Database) nCr = n!/(r!(n-r!)) = 861 Scaled and combined at 0dB TIR Broken down into frames of 256 samples Voiced frames extracted Training – 430 co-channel combinations Testing – 861 co-channel combinations Monday, November 29, 2004

59 DMC Distributions Monday, November 29, 2004

60 DMC Distributions with Filtering
Monday, November 29, 2004

61 Experimental Set-Up 25 Speech utterances (TIMIT Database)
12 male files and 13 female files Lowpass filter used as pre-processing block Each file broken down into frames of 128 samples each Monday, November 29, 2004

62 Results Monday, November 29, 2004

63 Results Monday, November 29, 2004

64 Result Comparison Monday, November 29, 2004

65 ND Distributions with Filtering
Monday, November 29, 2004

66 DMC Distributions with Filtering
Monday, November 29, 2004

67 Varying N Monday, November 29, 2004

68 Experimental Set-Up 25 Speech utterances (TIMIT Database)
12 male files and 13 female files Lowpass filter used as pre-processing block Each file broken down into frames of 128 samples each Monday, November 29, 2004

69 Results Monday, November 29, 2004

70 Results Monday, November 29, 2004

71 Result Comparison Monday, November 29, 2004

72 Comparison of ND and DMC Measures
Monday, November 29, 2004

73 Fusion New measures fused with residual energy (RE) measure.
Decision-level fusion performed If ((measure1 < threshold1) & (measure2 < threshold2) ) Speech frame = voiced Else Speech frame != voiced Monday, November 29, 2004

74 Difference-Mean Comparison
Summary Speech State-Space Embedding Difference-Mean Comparison Nodal Density Usable Speech Detection Voiced Speech Detection Monday, November 29, 2004


Download ppt "Structure-Based Speech Classification Using State-Space Embedding"

Similar presentations


Ads by Google