1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Slides:



Advertisements
Similar presentations
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Synthesizing naturally produced tokens Melissa Baese-Berk SoundLab 12 April 2009.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
ELEC 303 – Random Signals Lecture 20 – Random processes
Vocal microtremor in normophonic and mildly dysphonic speakers Jean Schoentgen Université Libre Bruxelles Brussels - Belgium.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
Speech Recognition in Noise
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
SWE 423: Multimedia Systems Chapter 7: Data Compression (5)
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Applications of Fourier Transform. Outline Sampling Bandwidth Energy density Power spectral density.
Random Processes ECE460 Spring, Power Spectral Density Generalities : Example: 2.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Texture analysis Team 5 Alexandra Bulgaru Justyna Jastrzebska Ulrich Leischner Vjekoslav Levacic Güray Tonguç.
Implementing a Speech Recognition System on a GPU using CUDA
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
T – Biomedical Signal Processing Chapters
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
LPC-analysis-VOSIM-resynthesis Combined class December 18 th 2012 Johan & Peter Institute of Sonology Royal Conservatory, The Hague.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
Gustavo Cancelo Analysis of the phase shift error between A and B signals in BPMs BPM project.
2D Fourier Transform.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Automatic Speech Recognition
Principios de Comunicaciones EL4005
CS 591 S1 – Computational Audio
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Vocoders.
Copyright © American Speech-Language-Hearing Association
Last update on June 15, 2010 Doug Young Suh
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech Recognition Christian Schulze
Linear Predictive Coding Methods
Advanced Digital Signal Processing
Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.
Linear Prediction.
Uses of filters To remove unwanted components in a signal
Speaker Identification:
Presentation transcript:

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates stable info –Attempts to eliminate factors which tend to vary most across utterances (and speakers)

2 40ms 20ms Frames Parameterise on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift

3 Crude Parametrisation Time domain Use short-term energy (STE) Sequentially segment the speech signal into frames Calculate STE for each frame STE: n refers to the nth sample

4

5 Why not use waveform samples? How many samples in a frame? –The more numbers the more computation How can we measure similarity? Use what we know about speech… –F0? –Spectrum? –Formants?

6 Crude Parametrisation Frequency related Use zero-crossing rate (ZCR) Calculate ZCR for each frame: where:

7

8 Multidimensionality We can combine multiple features into a feature vector Let’s combine STE and ZCR and measure the magnitude of each feature vector More complex multidimensional feature vectors are generally used in ASR STE ZCR 2-dimensional Feature Vector

9

10 Sophistication We need something more representative of the information in the speech less prone to variation…e.g. spectra The spectral slices we have been viewing to date in Praat are actually LPC (Linear Predictive Coding) spectra LPC attempts to remove the effects of phonation –Leaves us with correlate of VT configuration More later…