1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Slides:

Advertisements

Similar presentations

DCSP-12 Jianfeng Feng

Advertisements

DCSP-13 Jianfeng Feng

DCSP-13 Jianfeng Feng Department of Computer Science Warwick Univ., UK

Department of Kinesiology and Applied Physiology Spectrum Estimation W. Rose

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.

Let’s go back to this problem: We take N samples of a sinusoid (or a complex exponential) and we want to estimate its amplitude and frequency by the FFT.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Time-Frequency and Time-Scale Analysis of Doppler Ultrasound Signals

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Image Fourier Transform Faisal Farooq Q: How many signal processing engineers does it take to change a light bulb? A: Three. One to Fourier transform the.

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

A PRESENTATION BY SHAMALEE DESHPANDE

Spectra: ApplicationsComputational Geophysics and Data Analysis 1 Fourier Transform: Applications in seismology Estimation of spectra –windowing –resampling.

Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.

Representing Acoustic Information

Goals For This Class Quickly review of the main results from last class Convolution and Cross-correlation Discrete Fourier Analysis: Important Considerations.

Sampling Theorem, frequency resolution & Aliasing The Sampling Theorem will be the single most important constraint you'll learn in computer-aided instrumentation.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

Motivation Music as a combination of sounds at different frequencies

Multiresolution STFT for Analysis and Processing of Audio

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

The Story of Wavelets.

Fourier series. The frequency domain It is sometimes preferable to work in the frequency domain rather than time –Some mathematical operations are easier.

Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 

Implementing a Speech Recognition System on a GPU using CUDA

Jacob Zurasky ECE5526 – Spring 2011

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.

Real time DSP Professors: Eng. Julian S. Bruno Eng. Jerónimo F. Atencio Sr. Lucio Martinez Garbino.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.

Speech Processing Using HTK Trevor Bowden 12/08/2008.

The Story of Wavelets Theory and Engineering Applications

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.

Fourier transform.

CLASSIFICATION OF ECG SIGNAL USING WAVELET ANALYSIS

Lecture 19 Spectrogram: Spectral Analysis via DFT & DTFT

PATTERN COMPARISON TECHNIQUES

Discrete Fourier Transform (DFT)

Ch. 2 : Preprocessing of audio signals in time and frequency domain

CS 591 S1 – Computational Audio

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

Linear Prediction.

Linear Predictive Coding Methods

Advanced Digital Signal Processing

Lecture 18 DFS: Discrete Fourier Series, and Windowing

Govt. Polytechnic Dhangar(Fatehabad)

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates stable info –Attempts to eliminate factors which tend to vary most across utterances (and speakers)

2 40ms 20ms Frames Parameterise on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift

3 Crude Parametrisation Time domain Use short-term energy (STE) Sequentially segment the speech signal into frames Calculate STE for each frame STE: n refers to the nth sample

4

5 Why not use waveform samples? How many samples in a frame? –The more numbers the more computation How can we measure similarity? Use what we know about speech… –Spectrum!

6 Crude Parametrisation Frequency related Use zero-crossing rate (ZCR) Calculate ZCR for each frame: where:

7

8 Multidimensionality We can combine multiple features into a feature vector Let’s combine STE and ZCR and measure the magnitude of each feature vector More complex multidimensional feature vectors are generally used in ASR STE ZCR 2-dimensional Feature Vector

9

10 Parametrisation: Sophistication We need something more representative of the information in the speech less prone to variation The spectral slices we have been viewing to date in Praat are actually LPC (Linear Predictive Coding) spectra LPC attempts to remove the effects of phonation –Leaves us with correlate of VT configuration

11 Spectral Feature Extraction Extract compact set of spectral parameters (features) for each frame Frames usually overlapping

12 DFT spectra vs LPC spectra DFT (Discrete Fourier Transform) –Technique ubiquitous in DSP for spectral analysis –fft function in MATLAB demo > Numerics> Fast Fourier Transform –Demo function dftdemo_sinusoid_sig LPC –Mathematical encoding of signals –Based on modelling speech as a series of sums of exponentially decaying sinusoids –Source-filter decomposition –Typical example of how spectral information can be compressed

13 Preprocessing Speech for Spectral Estimation 1.Choose frequency resolution –Time/Frequency trade off –Parametrisation frame length 2.Pre-emphasise –Flattens spectrum which reduces spectral dynamic range which eases estimation 3.Apply window function in time domain –Tapers frame boundary values to zero –Gives better picture of spectrum

14 DFT Spectrum /u/

15 Frame Length:{5,40,200}ms

16 Freq. Resolution for {5,40,200}ms

17 Preemphasis: using diff

18 Preemphasis

19 Windowing: using hamming

20 Windowing: Spectral Effect

21 LPC Spectrum: using lpc

22 LPC Linear Predictive Coding Rule of thumb for order –(kHz of Sampling Frequency) + (2 to 4) –In previous figure, order 14 was used LP Coefficients can be easily transformed to centre frequencies and bandwidths of peaks in spectrum MATLAB lpc –1st coefficient returned always 1, so omit

23 Cepstrally Smoothed Spectrum

24 MFCCs Mel Frequency Cepstral Coefficients –Encodes/compresses spectral info in approx. 12 coefficients –Weights areas of perceptual importance more heavily –Will use them in HTK –Other parameterisations possible