Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Similar presentations


Presentation on theme: "Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented."— Presentation transcript:

1 Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented By Yi-Ting

2 Outline Introduction Speech Reconstruction Sinusoidal model Pitch Prediction GMM-based prediction HMM-based prediction Voiced / unvoiced classification Experimental Results Conclusion

3 Introduction Speech to be reconstructed from MFCC vectors through the inclusion of pitch information. The aim of this word is to predict the pitch frequency from the MFCC vector. Several studies have indicated that class- dependent correlation exists between the spectral envelop and pitch.

4 Speech Reconstruction Speech reconstruction from MFCC vectors and pitch using the sinusoidal model The sinusoidal model synthesis a speech signal,x(n). An estimate of the spectral envelope can be calculated from an MFCC vector by zero padding and applying and inverse discrete cosine transform (IDCT).

5 Speech Reconstruction A smoothed magnitude spectral estimate. Normalization must be applied to remove the effect of pre-emphasis and the non-linear filterbank channel. The frequency of the sinusoidal components,,can be estimated from the pitch frequency,,can be computed from the smoothed magnitude spectral estimate.

6 Pitch Prediction These scheme are based on modeling the joint density of the MFCC vector, x, and pitch frequency, f. Form a set of training data, a series of augmented feature vector, y, are extracted.

7 Pitch Prediction GMM-based prediction From the training set of augmented vectors, unsupervised clustering is implemented using EM algorithm to produce a set of K clusters. Each of these cluster is represented by Gaussian probability density function

8 Pitch Prediction GMM-based prediction Using these cluster-based correlations a prediction of the pitch frequency,, can be made from an input MFCC vector. The closest cluster, k. To prediction of pitch :

9 Pitch Prediction GMM-based prediction An alternative method combines the MAP pitch prediction from all K clusters in the GMM.

10 Pitch Prediction HMM-based prediction To better model the inherent correlation of the feature vector stream, a series of HMMs,

11 Pitch Prediction HMM-based prediction The first stage of training involves the creation of a set of HMM-based speech models. The training data is aligned to the speech models using Viterbi decoding and vectors belonging to each state, s, of each model, w. (Unvoiced vectors are removed to ensure) Clustering is applied to the pooled vectors within each voiced state using the EM algorithm.

12 Pitch Prediction HMM-based prediction Prediction of the pitch : (By first determining the model and state sequence from the set of models using Viterbi decoding.

13 Pitch Prediction Voiced / unvoiced classification through analysis of the resulting model, to classify MFCC stream into voiced or unvoiced speech. Voiced was calculated,

14 Pitch Prediction Voiced / unvoiced classification Using the state occupancy,, measured from the training data, the voicing is determined. The threshold,, has been determined experimentally with a reasonable value being =0.2.

15 Experimental Results Measure both the accuracy of pitch prediction and the resultant reconstructed speech quality. ETSI aurora database, 200 utterances for training and 90 for testing.

16 Experimental Results Pitch classification error is measured as, RMS pitch error is computed as,

17 Experimental Results

18 Increasing the number of clusters in each state of the HMM enables more detailed modeling of the joint distribution of MFCC and pitch. The significant majority of frame classification errors arise form arise from incorrect voiced/unvoiced decisions which in low energy regions at the start and end of speech.

19 Experimental Results

20

21 Conclusion Speech reconstructed from the predicted pitch, using a sinusoidal model, is almost indistinguishable from that reconstructed using the reference pitch.


Download ppt "Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented."

Similar presentations


Ads by Google