Download presentation
Presentation is loading. Please wait.
Published bySuzan Franklin Modified over 9 years ago
1
Time-Domain Methods for Speech Processing 虞台文
2
Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time Average Zero Crossing Rate Speech vs. Silence Discrimination Using Energy and Zero-Crossing The Short-Time Autocorrelation Function The Short-Time Average Magnitude Difference Function
3
Time-Domain Methods for Speech Processing Introduction
4
Speech Processing Methods Time-Domain Method: – Involving the waveform of speech signal directly. Frequency-Domain Method: – Involving some form of spectrum representation.
5
Time-Domain Measurements Average zero-crossing rate, energy, and the autocorrelation function. Very simple to implement. Provide a useful basis for estimating important features of the speech signal, e.g., – Voiced/unvoiced classification – Pitch estimation
6
Time-Domain Methods for Speech Processing Time-Dependent Processing of Speech
7
Time Dependent Natural of Speech This is a test.
8
Time Dependent Natural of Speech
9
Short-Time Behavior of Speech Assumption – The properties of speech signal change slowly with time. Analysis Frames – Short segment of speech signal. – Overlap one another usually.
10
Time-Dependent Analyses Analyzing each frame may produce either a single number, or a set of numbers, e.g., – Energy (a single number) – Vocal tract parameters (a set of numbers) This will produce a new time-dependent sequence.
11
General Form n: Frame index x(m): Speech signal T[ ]: A linear or nonlinear transformation. w(m): A window function (finite of infinite).
12
General Form Q n is a sequence of local weighted average values of the sequence T[x(m)].
13
Example Energy Short-Time Energy
14
Example Short-Time Energy
15
Short-Time Energy Example
16
General Short-Time-Analysis Scheme T [ ] Linear Filter Linear Filter Lowpass Filter Lowpass Filter Depending on the choice of window
17
Time-Domain Methods for Speech Processing Short-Time Energy and Average Magnitude
18
Applications Silence Detection Segmentation Lip Sync …
19
Short-Time Energy
20
Short-Time Average Magnitude
21
Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn
22
Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn What is the effect of windows?
23
The Effects of Windows Window length Window function
24
Rectangular Window
25
Mainlobe width Rectangular Window Peak sidelobe N=8 8
26
Rectangular Window What is this? Discuss the effect of window duration. Discuss the effect of mainlobe width and sidelobe peak. Mainlobe width Peak sidelobe N=8 8
27
Commonly Used Windows Rectangular Blackman Hanning Bartlett Hamming
28
Commonly Used Windows Rectangular Bartlett (Triangular) Hanning Hamming Blackman
29
Commonly Used Windows Rectangular Bartlett Hanning Hamming Blackman Least mainlobe width
30
Examples: Short-Time Energy Rectangular WindowHamming Window
31
Examples: Average Magnitude Rectangular WindowHamming Window
32
The Effects of Window Length Increasing the window length N, decreases the bandwidth. If N is too small, e.g., less than one pitch period, E n and M n will fluctuate very rapidly. If N is too large, e.g., on the order of several pitch periods, E n and M n will change very slowly.
33
The Choice of Window Length No signal value of N is entirely satisfactory. This is because the duration of a pitch period varies from about 2 ms for a high pitch female or a child, up to 25 ms for a very low pitch male.
34
Sampling Rate The bandwidth of both E n and M n is just that of the lowpass filter. So, they need not be sampled as frequently as speech signals. For example – Frame size = 20 ms – Sample period = 10 ms
35
Main Applications of E n and M n To provide the basis for distinguishing voiced speech segments from unvoiced segments. Silence detection.
36
Differences of E n and M n Emphasizing large sample-to- sample variations in x(n). The dynamic range (max/min) is approximately the square root of E n. The differences in level between voiced and unvoiced regions are not as pronounced as E n.
37
FIR and IIR All the windows that we discussed are FIR ’ s. Each of them is a lowpass filter. It can also be an IIR.
38
IIR Example Recursive formulas: Short-Time Energy: Short-Time Average magnitude:
39
Time-Domain Methods for Speech Processing Short-Time Average Zero-Crossing Rate
40
Voiced and Unvoiced Signals Th/i/s Thi/s/
41
The Short-Time Average Zero-Crossing Rate x(n)x(n) First Difference | ZnZn Lowpass Filter
42
Distribution of Zero-Crossings
43
Example
44
Time-Domain Methods for Speech Processing Speech vs. Silence Discrimination Using Energy and Zero-Crossing
45
Speech vs. Silence Discrimination Locating the beginning and end of a speech utterance in the environment with background of noise. Applications: – Segmentation of isolated word – Automatic speech recognition – Save bandwidth for speech transmission
46
Examples: In some cases, we can locate the beginning and end of a speech utterance using energy alone.
47
Examples: In other cases, we can locate the beginning and end of a speech utterance using zero-crossing rate alone.
48
Examples: Sometimes, we cannot do it using one criterion alone. Actual beginning
49
Difficulties In general, it is difficult to locate the boundaries if we encounter the following cases: – Weak fricatives (/f/, /th/, /h/) at the beginning or end. – Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. – Nasals at the end. – Voiced fricatives which become devoiced at the end of words. – Trailing off of vowel sounds at the end of an utterance.
50
Rabiner and Sambur 10 msec frame with sampling rate 100 time/sec is used. The algorithm assumes that the first 100 msec of the interval contains no speech. The means and standard deviations of the average magnitude and zero-crossing rate of this interval are computed to characterize the background noise.
51
The Algorithm
52
1 2 3 No more than 25 frames
53
Examples
55
Time-Domain Methods for Speech Processing The Short-Time Autocorrelation Function
56
Autocorrelation Functions x(m)x(m) x(m+k)x(m+k) k
57
Properties 1. Even: (k) = ( k). 2. (k) (0) for all k. 3. (0) is equal to the energy of x(m). x(m)x(m) x(m+k)x(m+k) k
58
Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then x(m)x(m) x(m+k)x(m+k) k
59
Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then This motivates us to use autocorrelation for pitch detection.
60
x(m+k)w(n k m) Short-Time Version x(m)x(m) x(m)w(nm)x(m)w(nm) n k
61
Property x(mk)w(n+km)x(mk)w(n+km) k x(m)w(nm)x(m)w(nm) x(m+k)w(n k m) k R n (k) R n ( k)
62
Property yk(m)yk(m) hk(nm)hk(nm)
63
yk(m)yk(m) hk(nm)hk(nm)
64
zkzk zkzk hk(n)hk(n) hk(n)hk(n) x(n)x(n) Rn(k)Rn(k)
65
Another Formulation
66
A noncausal formulation
67
Examples Rectangular WindowHamming Window N=401 voiced Unvoiced
68
Examples Less data will be involved for larger lag k. N=401 N=251 N=125
69
Modified Short-Time Autocorrelation Function Original Version: Modified Version:
70
Modified Short-Time Autocorrelation Function K Max. lag
71
Modified Short-Time Autocorrelation Function K Max. lag
72
Examples Rectangular Window N=401 voiced Unvoiced Modified Version Similar
73
Examples Rectangular WindowModified Version N=401 N=251 N=125
74
Time-Domain Methods for Speech Processing The Short-Time Average Magnitude Difference Function
75
The AMDF If x(n) is periodic with period P, then Computationally more effective than autocorrelation.
76
Example voiced Unvoiced
77
Exercise Recording a piece of yours speech to perform voice/unvoice segmentation. Design a effective algorithm to perform autocorrelation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.