Spectral envelope analysis of TIMIT corpus using LP, WLSP, and MVDR Steve Vest Matlab implementation of methods by Tien-Hsiang Lo.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Manifold Sparse Beamforming
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech Group INRIA Lorraine
Unit 9 IIR Filter Design 1. Introduction The ideal filter Constant gain of at least unity in the pass band Constant gain of zero in the stop band The.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Speaker Adaptation for Vowel Classification
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Speech Recognition in Noise
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Audio Steganography Echo Data Hiding
EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
1 Part 5 Response of Linear Systems 6.Linear Filtering of a Random Signals 7.Power Spectrum Analysis 8.Linear Estimation and Prediction Filters 9.Mean-Square.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Jacob Zurasky ECE5526 – Spring 2011
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
EE513 Audio Signals and Systems
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
Chapter 7 Finite Impulse Response(FIR) Filter Design
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
More On Linear Predictive Analysis
Autoregressive (AR) Spectral Estimation
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
July 23, BSA, a Fast and Accurate Spike Train Encoding Scheme Benjamin Schrauwen.
Topic: Pitch Extraction
Linear Prediction.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
PATTERN COMPARISON TECHNIQUES
Figure 11.1 Linear system model for a signal s[n].
Digital Communications Chapter 13. Source Coding
Vocoders.
Linear Prediction Simple first- and second-order systems
Posture Monitoring System for Context Awareness in Mobile Computing
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
3. Applications to Speaker Verification
Frequency Domain Perceptual Linear Predicton (FDPLP)
Modern Spectral Estimation
Linear Predictive Coding Methods
Linear Prediction.
Chapter 7 Finite Impulse Response(FIR) Filter Design
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

Spectral envelope analysis of TIMIT corpus using LP, WLSP, and MVDR Steve Vest Matlab implementation of methods by Tien-Hsiang Lo

Overview Methods WLSP MVDR TIMIT corpus Measurements

Analysis methods LP Linear Prediction using autocorrelation method WLSP Weighted-sum Line Spectrum Pairs MVDR Minimum Variance Distortionless Response MVDR of WLSP MVDR applied to WLSP coefficients

WLSP Purpose: Increase spectral dynamics between peaks and valleys in spectral envelope Maximizes difference between peak and valley amplitudes Uses autocorrelation values beyond N to obtain better accuracy When applied to Speech coding Improves quality of decoded speech Attenuates quantization noise level in the valleys

WLSP Algorithm 1.Apply Hamming window to signal 2.Calculate N-1 order LP coefficients 3.Using LP coefficients calculate LSP polynomials where p and q are the symmetric and antisymmetric LSP polynomials, â is the zero- extended vector of LP coefficients, and â R is the reversal of â.

WLSP Algorithm 3.Calculate WLSP polynomial 4.λ is the weighting parameter chosen to minimize the error between the autocorrelations of the speech and the WLSP all-pole filter impulse response autocorrelations match n=1:N Minimize SSE for n=N+1:N+1+L

WLSP vs. LP

MVDR Estimates the power at each frequency by applying a special FIR filter Distortionless constraint FIR filter minimizes the total output power while preserving unity gain at the estimating frequency Solving for distortionless filter is a constrained optimization problem More robust modeling method than LP but can be equated from LP

MVDR Algorithm 1.Calculate LP coefficients a k 2.Calculate MVDR coefficients μ k Note that MVDR coefficients are symmetric and have order 2N+1

MVDR vs. LP

MVDR of WLSP Just an exercise out of curiosity Performs WLSP Performs MVDR using coefficients from WLSP instead of LP Resulting conclusion It’s a bad idea…

MVDR of WLSP vs. MVDR

TIMIT corpus “The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.” Large collection of speech samples from 8 regions of the USA Samples are phonetically labeled

TIMIT regions Region 1: New England Region 2: Northern Region 3: North Midland Region 4: South Midland Region 5: Southern Region 6: New York City Region 7: Western Region 8: Army Brat (moved around)

Analyzed Vowels iybeet ihbit ehbet eybait aebat aabott awbout aybite ahbut aobought oyboy owboat uhbook uwboot uxtoot erbird axabout ixdebit axrbutter ax-hsuspect

Collected Data First three formants Frequency [Hz] Amplitude [dB] Valleys after formants Frequency [Hz] Delta [dB] Difference between formant amplitude and valley amplitude Collected from entire training data set in TIMIT corpus

Collected Data Data organized by: Vowel Region Sex Spectral approximation method Trineme Phonemes preceding and following vowel

Collected Data Filter orders N=22 LP: N → 22 WLSP: M=N+1=23 MVDR: M=2(2N)+1=89 MVDR of WLSP: M=2(2N)+1=89 WLSP data is erroneous Hamming window was not applied which has noticeable impact on results MVDR of WLSP needs to be excluded MVDR order is too high

General Observations Formant locations vary greatly Between different speakers Between different Trinemes Hz for F Hz for F Hz for F3

Work still to be done Optimize methods e.g. WLSP search method for λ Analysis of data took over 5 hrs Determine best filter orders for each method Reorganize data storage for easier analysis Very difficult to sort through 100,000 sets of data averages Determine exact statistics to be taken Perform analysis of TIMIT data again

Sources Murthi, Manohar N. “All-Pole Modeling of Speech Based on the Minimum Variance Distortionless Response Spectrum”. IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 3, May 2000 Backstrom, Tom. “All-Pole Modeling Technique Based on Weighted Sum of LSP Polynomials”. IEEE Signal Processing Letters, Vol. 10, No. 6, June 2003