Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Similar presentations


Presentation on theme: "Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time."— Presentation transcript:

1 Time-Domain Methods for Speech Processing 虞台文

2 Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time Average Zero Crossing Rate Speech vs. Silence Discrimination Using Energy and Zero-Crossing The Short-Time Autocorrelation Function The Short-Time Average Magnitude Difference Function

3 Time-Domain Methods for Speech Processing Introduction

4 Speech Processing Methods Time-Domain Method: – Involving the waveform of speech signal directly. Frequency-Domain Method: – Involving some form of spectrum representation.

5 Time-Domain Measurements Average zero-crossing rate, energy, and the autocorrelation function. Very simple to implement. Provide a useful basis for estimating important features of the speech signal, e.g., – Voiced/unvoiced classification – Pitch estimation

6 Time-Domain Methods for Speech Processing Time-Dependent Processing of Speech

7 Time Dependent Natural of Speech This is a test.

8 Time Dependent Natural of Speech

9 Short-Time Behavior of Speech Assumption – The properties of speech signal change slowly with time. Analysis Frames – Short segment of speech signal. – Overlap one another usually.

10 Time-Dependent Analyses Analyzing each frame may produce either a single number, or a set of numbers, e.g., – Energy (a single number) – Vocal tract parameters (a set of numbers) This will produce a new time-dependent sequence.

11 General Form n: Frame index x(m): Speech signal T[ ]: A linear or nonlinear transformation. w(m): A window function (finite of infinite).

12 General Form Q n is a sequence of local weighted average values of the sequence T[x(m)].

13 Example Energy Short-Time Energy

14 Example Short-Time Energy

15 Short-Time Energy Example

16 General Short-Time-Analysis Scheme T [ ] Linear Filter Linear Filter Lowpass Filter Lowpass Filter Depending on the choice of window

17 Time-Domain Methods for Speech Processing Short-Time Energy and Average Magnitude

18 Applications Silence Detection Segmentation Lip Sync …

19 Short-Time Energy

20 Short-Time Average Magnitude

21 Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn

22 Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn What is the effect of windows?

23 The Effects of Windows Window length Window function

24 Rectangular Window

25 Mainlobe width Rectangular Window Peak sidelobe N=8 8

26 Rectangular Window What is this? Discuss the effect of window duration. Discuss the effect of mainlobe width and sidelobe peak. Mainlobe width Peak sidelobe N=8 8

27 Commonly Used Windows Rectangular Blackman Hanning Bartlett Hamming

28 Commonly Used Windows Rectangular Bartlett (Triangular) Hanning Hamming Blackman

29 Commonly Used Windows Rectangular Bartlett Hanning Hamming Blackman Least mainlobe width

30 Examples: Short-Time Energy Rectangular WindowHamming Window

31 Examples: Average Magnitude Rectangular WindowHamming Window

32 The Effects of Window Length Increasing the window length N, decreases the bandwidth. If N is too small, e.g., less than one pitch period, E n and M n will fluctuate very rapidly. If N is too large, e.g., on the order of several pitch periods, E n and M n will change very slowly.

33 The Choice of Window Length No signal value of N is entirely satisfactory. This is because the duration of a pitch period varies from about 2 ms for a high pitch female or a child, up to 25 ms for a very low pitch male.

34 Sampling Rate The bandwidth of both E n and M n is just that of the lowpass filter. So, they need not be sampled as frequently as speech signals. For example – Frame size = 20 ms – Sample period = 10 ms

35 Main Applications of E n and M n To provide the basis for distinguishing voiced speech segments from unvoiced segments. Silence detection.

36 Differences of E n and M n Emphasizing large sample-to- sample variations in x(n). The dynamic range (max/min) is approximately the square root of E n. The differences in level between voiced and unvoiced regions are not as pronounced as E n.

37 FIR and IIR All the windows that we discussed are FIR ’ s. Each of them is a lowpass filter. It can also be an IIR.

38 IIR Example Recursive formulas: Short-Time Energy: Short-Time Average magnitude:

39 Time-Domain Methods for Speech Processing Short-Time Average Zero-Crossing Rate

40 Voiced and Unvoiced Signals Th/i/s Thi/s/

41 The Short-Time Average Zero-Crossing Rate x(n)x(n) First Difference | ZnZn Lowpass Filter

42 Distribution of Zero-Crossings

43 Example

44 Time-Domain Methods for Speech Processing Speech vs. Silence Discrimination Using Energy and Zero-Crossing

45 Speech vs. Silence Discrimination Locating the beginning and end of a speech utterance in the environment with background of noise. Applications: – Segmentation of isolated word – Automatic speech recognition – Save bandwidth for speech transmission

46 Examples: In some cases, we can locate the beginning and end of a speech utterance using energy alone.

47 Examples: In other cases, we can locate the beginning and end of a speech utterance using zero-crossing rate alone.

48 Examples: Sometimes, we cannot do it using one criterion alone. Actual beginning

49 Difficulties In general, it is difficult to locate the boundaries if we encounter the following cases: – Weak fricatives (/f/, /th/, /h/) at the beginning or end. – Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. – Nasals at the end. – Voiced fricatives which become devoiced at the end of words. – Trailing off of vowel sounds at the end of an utterance.

50 Rabiner and Sambur 10 msec frame with sampling rate 100 time/sec is used. The algorithm assumes that the first 100 msec of the interval contains no speech. The means and standard deviations of the average magnitude and zero-crossing rate of this interval are computed to characterize the background noise.

51 The Algorithm

52 1 2 3 No more than 25 frames

53 Examples

54

55 Time-Domain Methods for Speech Processing The Short-Time Autocorrelation Function

56 Autocorrelation Functions x(m)x(m) x(m+k)x(m+k) k

57 Properties 1. Even:  (k) =  (  k). 2.  (k)   (0) for all k. 3.  (0) is equal to the energy of x(m). x(m)x(m) x(m+k)x(m+k) k

58 Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then x(m)x(m) x(m+k)x(m+k) k

59 Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then This motivates us to use autocorrelation for pitch detection.

60 x(m+k)w(n  k  m) Short-Time Version x(m)x(m) x(m)w(nm)x(m)w(nm) n k

61 Property x(mk)w(n+km)x(mk)w(n+km) k x(m)w(nm)x(m)w(nm) x(m+k)w(n  k  m) k R n (k) R n (  k)

62 Property yk(m)yk(m) hk(nm)hk(nm)

63 yk(m)yk(m) hk(nm)hk(nm)

64 zkzk zkzk hk(n)hk(n) hk(n)hk(n) x(n)x(n) Rn(k)Rn(k)

65 Another Formulation

66 A noncausal formulation

67 Examples Rectangular WindowHamming Window N=401 voiced Unvoiced

68 Examples Less data will be involved for larger lag k. N=401 N=251 N=125

69 Modified Short-Time Autocorrelation Function Original Version: Modified Version:

70 Modified Short-Time Autocorrelation Function K Max. lag

71 Modified Short-Time Autocorrelation Function K Max. lag

72 Examples Rectangular Window N=401 voiced Unvoiced Modified Version Similar

73 Examples Rectangular WindowModified Version N=401 N=251 N=125

74 Time-Domain Methods for Speech Processing The Short-Time Average Magnitude Difference Function

75 The AMDF If x(n) is periodic with period P, then Computationally more effective than autocorrelation.

76 Example voiced Unvoiced

77 Exercise Recording a piece of yours speech to perform voice/unvoice segmentation. Design a effective algorithm to perform autocorrelation.


Download ppt "Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time."

Similar presentations


Ads by Google