1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
2 4.2 SPEECH (ENDPIONT) DETECTION
3 4.3 DISTORTION MEASURES- MATHEMATICAL CONSIDERATIONS x and y: two feature vectors defined on a vector space X The properties of metric or distance function d: A distance function is called invariant if
4 PERCEPTUAL CONSIDERATIONS Spectral changes that do not fundamentally change the perceived sound include:
5 PERCEPTUAL CONSIDERATIONS Spectral changes that lead to phonetically different sounds include:
6 PERCEPTUAL CONSIDERATIONS Just-discriminable change: known as JND (just-noticeable difference), DL (difference limen), or differential threshold
7 4.4 DISTORTION MEASURES- PERCEPTUAL CONSIDERATIONS
8
9 Spectral Distortion Measures Spectral Density Fourier Coefficients of Spectral Density Autocorrelation Function
10 Spectral Distortion Measures Short-term autocorrelation Then is an energy spectral density
11 Spectral Distortion Measures Autocorrelation matrices
12 Spectral Distortion Measures If σ/A(z) is the all-pole model for the speech spectrum, The residual energy resulting from “inverse filtering” the input signal with an all-zero filter A(z) is:
13 Spectral Distortion Measures Important properties of all-pole modeling: The recursive minimization relationship:
14 LOG SPECTRAL DISTANCE
15 LOG SPECTRAL DISTANCE
16 CEPSTRAL DISTANCES The complex cepstrum of a signal is defined as The Fourier transform of log of the signal spectrum.
17 CEPSTRAL DISTANCES Truncated cepstral distance
18 CEPSTRAL DISTANCES
19 CEPSTRAL DISTANCES
20 Weighted Cepstral Distances and Liftering It can be shown that under certain regular conditions, the cepstral coefficients, except c0, have: 1)Zero means 2)Variances essentially inversed proportional to the square of the coefficient index: If we normalize the cepstral distance by the variance inverse:
21 Weighted Cepstral Distances and Liftering Differentiating both sides of the Fourier series equation of spectrum: This is an L2 distance based upon the differences between the spectral slopes
22 Cepstral Weighting or Liftering Procedure h is usually chosen as L/2 and L is typically 10 to 16
23 A useful form of weighted cepstral distance:
24 Likelihood Distortions Previously defined: Itakura-Saito distortion measure Where and are one-step prediction errors of and as defined: of and as defined:
25
26 Likelihood Distortions The residual energy can be easily evaluated by:
27 By replacing by its optimal p-th order LPC model spectrum: If we set σ 2 to match the residual energy α : Which is often referred to as Itakura distortion measure Likelihood Distortions
28 Likelihood Distortions Another way to write the Itakura distortion measure is: Another gain-independent distortion measure is called the Likelihood Ratio distortion:
Likelihood Distortions
Likelihood Distortions That is, when the distortion is small, the Itakura distortion measure is not very different from the LR distortion measure is not very different from the LR distortion measure
Likelihood Distortions
Likelihood Distortions Consider the Itakura-Saito distortion between the input and output of a linear system H(z)
Likelihood Distortions
Likelihood Distortions
Variations of Likelihood Distortions Symmetric distortion measures:
Variations of Likelihood Distortions COSH distortion
Variations of Likelihood Distortions
Spectral Distortion Using a Warped Frequency Scale Psychophysical studies have shown that human perception of the frequency Content of sounds does not follow a linear scale. This research has led to the idea of defining subjective pitch of pure tones. For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the “mel” scale. As a reference point, the pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.
39
Spectral Distortion Using a Warped Frequency Scale
Spectral Distortion Using a Warped Frequency Scale
Spectral Distortion Using a Warped Frequency Scale
43 Examples of Critical bandwidth
44 Warped cepstral distance b is the frequency in Barks, S(θ(b)) is the spectrum on a Bark scale, and B is the Nyquist frequency in Barks.
Spectral Distortion Using a Warped Frequency Scale Where the warping function is defined by
Spectral Distortion Using a Warped Frequency Scale
Spectral Distortion Using a Warped Frequency Scale
Spectral Distortion Using a Warped Frequency Scale
Spectral Distortion Using a Warped Frequency Scale Mel-frequency cepstrum: is the output power of the triangular filters is the output power of the triangular filters Mel-frequency cepstral distance
Alternative Spectral Representations and Distortion Measures
Alternative Spectral Representations and Distortion Measures Wave reflection occurs at each sectional boundary with reflection coefficients denoted by
Alternative Spectral Representations and Distortion Measures Another possible parametric representation of the all-pole spectrum is the set of line spectral frequencies (LSFs) defined as the roots of the following two polynomials based Upon the inverse filter A(z): These two polynomials are equivalent to artificially augmenting the p-section nonuniform acoustic tube with an extra section that is either completely closed (area=0) or completely open (area=∞). LSF parameters, due to their particular structure, possess properties similar to those of the formant frequencies and bandwidths.
Alternative Spectral Representations and Distortion Measures Weighted slope metric proposed by Klatt:
Alternative Spectral Representations and Distortion Measures
Alternative Spectral Representations and Distortion Measures
56 ComputationExpressionNotationDistortion Measure Measure Summary of Spectral Distortion Measures
57 ComputationExpressionNotation Distortion Measure Summary of Spectral Distortion Measures
58 ComputationExpressionNotation Distortion Measure Summary of Spectral Distortion Measures
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A first-order differential (log) spectrum is defined by:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Fitting the cepstral trajectory by a second order polynomial, Choose h1, h2, h3 such that E is minimized. Differentiating E with respect to h1, h2, and h3 and setting to zero results in 3 equations:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE The solutions to these equations are:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE The first and second time derivatives of cn can be obtained by differentiating the fitting curve, giving
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A differential spectral distance: A second differential spectral distance:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Cepstral weighting or liftering by differentiating Combining the first and second differential spectral distances with the Cepstral distance results in:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A weighted differential cepstral distance:
INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Taking the L2 distance Other operators can be added to produce a combined representation Of the spectrum and the differential spectra. As an example: