Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Slides:

Advertisements

Similar presentations

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.

Windowing Purpose: process pieces of a signal and minimize impact to the frequency domain Using a window – First Create the window: Use the window formula.

Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.

Ideal Filters One of the reasons why we design a filter is to remove disturbances Filter SIGNAL NOISE We discriminate between signal and noise in terms.

Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

A PRESENTATION BY SHAMALEE DESHPANDE

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

Representing Acoustic Information

Introduction to Spectral Estimation

1 Non-Parametric Power Spectrum Estimation Methods Eric Hui SYDE 770 Course Project November 28, 2002.

Lecture 9 FIR and IIR Filter design using Matlab

CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Lecture 1 Signals in the Time and Frequency Domains

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Linear Predictive Analysis 主講人：虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.

Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.

1 Lecture 1: February 20, 2007 Topic: 1. Discrete-Time Signals and Systems.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

1 Spectrum Estimation Dr. Hassanpour Payam Masoumi Mariam Zabihi Advanced Digital Signal Processing Seminar Department of Electronic Engineering Noushirvani.

Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –

Lecture#10 Spectrum Estimation

Chapter 7. Filter Design Techniques

More On Linear Predictive Analysis

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.

Frequency Domain Coding of Speech 主講人：虞台文. Content Introduction The Short-Time Fourier Transform The Short-Time Discrete Fourier Transform Wide-Band Analysis/Synthesis.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

In The Name of God The Compassionate The Merciful.

1 Chapter 3 Digital models of speech signal. 2 Introduction In order to apply digital signal processing technique to speech processing problems, it is.

Speech Enhancement Summer 2009

Figure 11.1 Linear system model for a signal s[n].

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Linear Predictive Coding Methods

Ideal Filters One of the reasons why we design a filter is to remove disturbances Filter SIGNAL NOISE We discriminate between signal and noise in terms.

Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.

APPLICATION of the DFT: Estimation of Frequency Spectrum

Linear Prediction.

Chapter 7 Finite Impulse Response(FIR) Filter Design

EE Audio Signals and Systems

Chapter 7 Finite Impulse Response(FIR) Filter Design

Speech Processing Final Project

An Algorithm for Determining the Endpoints for Isolated Utterances

Presentation transcript:

Time-Domain Methods for Speech Processing 虞台文

Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time Average Zero Crossing Rate Speech vs. Silence Discrimination Using Energy and Zero-Crossing The Short-Time Autocorrelation Function The Short-Time Average Magnitude Difference Function

Time-Domain Methods for Speech Processing Introduction

Speech Processing Methods Time-Domain Method: – Involving the waveform of speech signal directly. Frequency-Domain Method: – Involving some form of spectrum representation.

Time-Domain Measurements Average zero-crossing rate, energy, and the autocorrelation function. Very simple to implement. Provide a useful basis for estimating important features of the speech signal, e.g., – Voiced/unvoiced classification – Pitch estimation

Time-Domain Methods for Speech Processing Time-Dependent Processing of Speech

Time Dependent Natural of Speech This is a test.

Time Dependent Natural of Speech

Short-Time Behavior of Speech Assumption – The properties of speech signal change slowly with time. Analysis Frames – Short segment of speech signal. – Overlap one another usually.

Time-Dependent Analyses Analyzing each frame may produce either a single number, or a set of numbers, e.g., – Energy (a single number) – Vocal tract parameters (a set of numbers) This will produce a new time-dependent sequence.

General Form n: Frame index x(m): Speech signal T[ ]: A linear or nonlinear transformation. w(m): A window function (finite of infinite).

General Form Q n is a sequence of local weighted average values of the sequence T[x(m)].

Example Energy Short-Time Energy

Example Short-Time Energy

Short-Time Energy Example

General Short-Time-Analysis Scheme T [ ] Linear Filter Linear Filter Lowpass Filter Lowpass Filter Depending on the choice of window

Time-Domain Methods for Speech Processing Short-Time Energy and Average Magnitude

Applications Silence Detection Segmentation Lip Sync …

Short-Time Energy

Short-Time Average Magnitude

Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn

Block Diagram Representation [ ] 2 x(n)x(n) x2(n)x2(n) | x(n)x(n) |x(n)| h(n)h(n) h(n)h(n) EnEn w(n)w(n) w(n)w(n) MnMn What is the effect of windows?

The Effects of Windows Window length Window function

Rectangular Window

Mainlobe width Rectangular Window Peak sidelobe N=8 8

Rectangular Window What is this? Discuss the effect of window duration. Discuss the effect of mainlobe width and sidelobe peak. Mainlobe width Peak sidelobe N=8 8

Commonly Used Windows Rectangular Blackman Hanning Bartlett Hamming

Commonly Used Windows Rectangular Bartlett (Triangular) Hanning Hamming Blackman

Commonly Used Windows Rectangular Bartlett Hanning Hamming Blackman Least mainlobe width

Examples: Short-Time Energy Rectangular WindowHamming Window

Examples: Average Magnitude Rectangular WindowHamming Window

The Effects of Window Length Increasing the window length N, decreases the bandwidth. If N is too small, e.g., less than one pitch period, E n and M n will fluctuate very rapidly. If N is too large, e.g., on the order of several pitch periods, E n and M n will change very slowly.

The Choice of Window Length No signal value of N is entirely satisfactory. This is because the duration of a pitch period varies from about 2 ms for a high pitch female or a child, up to 25 ms for a very low pitch male.

Sampling Rate The bandwidth of both E n and M n is just that of the lowpass filter. So, they need not be sampled as frequently as speech signals. For example – Frame size = 20 ms – Sample period = 10 ms

Main Applications of E n and M n To provide the basis for distinguishing voiced speech segments from unvoiced segments. Silence detection.

Differences of E n and M n Emphasizing large sample-to- sample variations in x(n). The dynamic range (max/min) is approximately the square root of E n. The differences in level between voiced and unvoiced regions are not as pronounced as E n.

FIR and IIR All the windows that we discussed are FIR ’ s. Each of them is a lowpass filter. It can also be an IIR.

IIR Example Recursive formulas: Short-Time Energy: Short-Time Average magnitude:

Time-Domain Methods for Speech Processing Short-Time Average Zero-Crossing Rate

Voiced and Unvoiced Signals Th/i/s Thi/s/

The Short-Time Average Zero-Crossing Rate x(n)x(n) First Difference | ZnZn Lowpass Filter

Distribution of Zero-Crossings

Example

Time-Domain Methods for Speech Processing Speech vs. Silence Discrimination Using Energy and Zero-Crossing

Speech vs. Silence Discrimination Locating the beginning and end of a speech utterance in the environment with background of noise. Applications: – Segmentation of isolated word – Automatic speech recognition – Save bandwidth for speech transmission

Examples: In some cases, we can locate the beginning and end of a speech utterance using energy alone.

Examples: In other cases, we can locate the beginning and end of a speech utterance using zero-crossing rate alone.

Examples: Sometimes, we cannot do it using one criterion alone. Actual beginning

Difficulties In general, it is difficult to locate the boundaries if we encounter the following cases: – Weak fricatives (/f/, /th/, /h/) at the beginning or end. – Weak plosive bursts (/p/, /t/, /k/) at the beginning or end. – Nasals at the end. – Voiced fricatives which become devoiced at the end of words. – Trailing off of vowel sounds at the end of an utterance.

Rabiner and Sambur 10 msec frame with sampling rate 100 time/sec is used. The algorithm assumes that the first 100 msec of the interval contains no speech. The means and standard deviations of the average magnitude and zero-crossing rate of this interval are computed to characterize the background noise.

The Algorithm

1 2 3 No more than 25 frames

Examples

Time-Domain Methods for Speech Processing The Short-Time Autocorrelation Function

Autocorrelation Functions x(m)x(m) x(m+k)x(m+k) k

Properties 1. Even:  (k) =  (  k). 2.  (k)   (0) for all k. 3.  (0) is equal to the energy of x(m). x(m)x(m) x(m+k)x(m+k) k

Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then x(m)x(m) x(m+k)x(m+k) k

Properties 4. If x(m) has period P, i.e. x(m)= x(m+P), then This motivates us to use autocorrelation for pitch detection.

x(m+k)w(n  k  m) Short-Time Version x(m)x(m) x(m)w(nm)x(m)w(nm) n k

Property x(mk)w(n+km)x(mk)w(n+km) k x(m)w(nm)x(m)w(nm) x(m+k)w(n  k  m) k R n (k) R n (  k)

Property yk(m)yk(m) hk(nm)hk(nm)

yk(m)yk(m) hk(nm)hk(nm)

zkzk zkzk hk(n)hk(n) hk(n)hk(n) x(n)x(n) Rn(k)Rn(k)

Another Formulation

A noncausal formulation

Examples Rectangular WindowHamming Window N=401 voiced Unvoiced

Examples Less data will be involved for larger lag k. N=401 N=251 N=125

Modified Short-Time Autocorrelation Function Original Version: Modified Version:

Modified Short-Time Autocorrelation Function K Max. lag

Modified Short-Time Autocorrelation Function K Max. lag

Examples Rectangular Window N=401 voiced Unvoiced Modified Version Similar

Examples Rectangular WindowModified Version N=401 N=251 N=125

Time-Domain Methods for Speech Processing The Short-Time Average Magnitude Difference Function

The AMDF If x(n) is periodic with period P, then Computationally more effective than autocorrelation.

Example voiced Unvoiced

Exercise Recording a piece of yours speech to perform voice/unvoice segmentation. Design a effective algorithm to perform autocorrelation.