Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Time-Domain Methods for Speech Processing 虞台文

Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time Average Zero Crossing Rate Speech vs. Silence Discrimination Using Energy and Zero-Crossing The Short-Time Autocorrelation Function The Short-Time Average Magnitude Difference Function

Time-Domain Methods for Speech Processing Introduction

Speech Processing Methods Time-Domain Method: – Involving the waveform of speech signal directly. Frequency-Domain Method: – Involving some form of spectrum representation.

Time-Domain Measurements Average zero-crossing rate, energy, and the autocorrelation function. Very simple to implement. Provide a useful basis for estimating important features of the speech signal, e.g., – Voiced/unvoiced classification – Pitch estimation

Time-Domain Methods for Speech Processing Time-Dependent Processing of Speech

Time Dependent Natural of Speech This is a test.

Time Dependent Natural of Speech

Short-Time Behavior of Speech Assumption – The properties of speech signal change slowly with time. Analysis Frames – Short segment of speech signal. – Overlap one another usually.

Time-Dependent Analyses Analyzing each frame may produce either a single number, or a set of numbers, e.g., – Energy (a single number) – Vocal tract parameters (a set of numbers) This will produce a new time-dependent sequence.

General Form n: Frame index x(m): Speech signal T[ ]: A linear or nonlinear transformation. w(m): A window function (finite of infinite).

General Form Q n is a sequence of local weighted average values of the sequence T[x(m)].

Example Energy Short-Time Energy

Example Short-Time Energy

Short-Time Energy Example

General Short-Time-Analysis Scheme T [ ] Linear Filter Linear Filter Lowpass Filter Lowpass Filter Depending on the choice of window

Time-Domain Methods for Speech Processing Short-Time Energy and Average Magnitude

Applications Silence Detection Segmentation Lip Sync …

Short-Time Energy

Short-Time Average Magnitude

Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn

Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn What is the effect of windows?

The Effects of Windows Window length Window function

Rectangular Window

Mainlobe width Rectangular Window Peak sidelobe N=8 8

Rectangular Window What is this? Discuss the effect of window duration. Discuss the effect of mainlobe width and sidelobe peak. Mainlobe width Peak sidelobe N=8 8

Commonly Used Windows Rectangular Blackman Hanning Bartlett Hamming

Commonly Used Windows Rectangular Bartlett (Triangular) Hanning Hamming Blackman

Commonly Used Windows Rectangular Bartlett Hanning Hamming Blackman Least mainlobe width

Examples: Short-Time Energy Rectangular WindowHamming Window

Examples: Average Magnitude Rectangular WindowHamming Window

The Effects of Window Length Increasing the window length N, decreases the bandwidth. If N is too small, e.g., less than one pitch period, E n and M n will fluctuate very rapidly. If N is too large, e.g., on the order of several pitch periods, E n and M n will change very slowly.

The Choice of Window Length No signal value of N is entirely satisfactory. This is because the duration of a pitch period varies from about 2 ms for a high pitch female or a child, up to 25 ms for a very low pitch male.

Sampling Rate The bandwidth of both E n and M n is just that of the lowpass filter. So, they need not be sampled as frequently as speech signals. For example – Frame size = 20 ms – Sample period = 10 ms

Main Applications of E n and M n To provide the basis for distinguishing voiced speech segments from unvoiced segments. Silence detection.

Differences of E n and M n Emphasizing large sample-to- sample variations in x(n). The dynamic range (max/min) is approximately the square root of E n. The differences in level between voiced and unvoiced regions are not as pronounced as E n.

FIR and IIR All the windows that we discussed are FIR ’ s. Each of them is a lowpass filter. It can also be an IIR.

IIR Example Recursive formulas: Short-Time Energy: Short-Time Average magnitude:

Time-Domain Methods for Speech Processing Short-Time Average Zero-Crossing Rate

Voiced and Unvoiced Signals Th/i/s Thi/s/

The Short-Time Average Zero-Crossing Rate x(n)x(n) First Difference | ZnZn Lowpass Filter

Distribution of Zero-Crossings

Example

Time-Domain Methods for Speech Processing Speech vs. Silence Discrimination Using Energy and Zero-Crossing

Speech vs. Silence Discrimination Locating the beginning and end of a speech utterance in the environment with background of noise. Applications: – Segmentation of isolated word – Automatic speech recognition – Save bandwidth for speech transmission

Examples: In some cases, we can locate the beginning and end of a speech utterance using energy alone.

Examples: In other cases, we can locate the beginning and end of a speech utterance using zero-crossing rate alone.

Examples: Sometimes, we cannot do it using one criterion alone. Actual beginning

Difficulties In general, it is difficult to locate the boundaries if we encounter the following cases: – Weak fricatives (/f/, /th/, /h/) at the beginning or end. – Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. – Nasals at the end. – Voiced fricatives which become devoiced at the end of words. – Trailing off of vowel sounds at the end of an utterance.

Rabiner and Sambur 10 msec frame with sampling rate 100 time/sec is used. The algorithm assumes that the first 100 msec of the interval contains no speech. The means and standard deviations of the average magnitude and zero-crossing rate of this interval are computed to characterize the background noise.

The Algorithm

1 2 3 No more than 25 frames

Examples

Time-Domain Methods for Speech Processing The Short-Time Autocorrelation Function

Autocorrelation Functions x(m)x(m) x(m+k)x(m+k) k

Properties 1. Even:  (k) =  (  k). 2.  (k)   (0) for all k. 3.  (0) is equal to the energy of x(m). x(m)x(m) x(m+k)x(m+k) k

Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then x(m)x(m) x(m+k)x(m+k) k

Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then This motivates us to use autocorrelation for pitch detection.

x(m+k)w(n  k  m) Short-Time Version x(m)x(m) x(m)w(nm)x(m)w(nm) n k

Property x(mk)w(n+km)x(mk)w(n+km) k x(m)w(nm)x(m)w(nm) x(m+k)w(n  k  m) k R n (k) R n (  k)

Property yk(m)yk(m) hk(nm)hk(nm)

yk(m)yk(m) hk(nm)hk(nm)

zkzk zkzk hk(n)hk(n) hk(n)hk(n) x(n)x(n) Rn(k)Rn(k)

Another Formulation

A noncausal formulation

Examples Rectangular WindowHamming Window N=401 voiced Unvoiced

Examples Less data will be involved for larger lag k. N=401 N=251 N=125

Modified Short-Time Autocorrelation Function Original Version: Modified Version:

Modified Short-Time Autocorrelation Function K Max. lag

Examples Rectangular Window N=401 voiced Unvoiced Modified Version Similar

Examples Rectangular WindowModified Version N=401 N=251 N=125

Time-Domain Methods for Speech Processing The Short-Time Average Magnitude Difference Function

The AMDF If x(n) is periodic with period P, then Computationally more effective than autocorrelation.

Example voiced Unvoiced

Exercise Recording a piece of yours speech to perform voice/unvoice segmentation. Design a effective algorithm to perform autocorrelation.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Similar presentations

Presentation on theme: "Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Similar presentations

Presentation on theme: "Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time."— Presentation transcript:

Similar presentations

About project

Feedback