Frequency Domain Perceptual Linear Predicton (FDPLP)

Slides:



Advertisements
Similar presentations
Design of Digital IIR Filter
Advertisements

[1] AN ANALYSIS OF DIGITAL WATERMARKING IN FREQUENCY DOMAIN.
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
1 Helsinki University of Technology,Communications Laboratory, Timo O. Korhonen Data Communication, Lecture6 Digital Baseband Transmission.
Spectral envelope analysis of TIMIT corpus using LP, WLSP, and MVDR Steve Vest Matlab implementation of methods by Tien-Hsiang Lo.
Signal Analysis Using Autoregressive Models of Amplitude Modulation Sriram Ganapathy Advisor - Hynek Hermansky
So far We have introduced the Z transform
Speech recognition from spectral dynamics HYNEK HERMANSKY The Johns Hopkins University, Baltimore, Maryland, USA Presenter : 張庭豪.
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech Recognition in Noise
Analogue and digital techniques in closed loop regulation applications Digital systems Sampling of analogue signals Sample-and-hold Parseval’s theorem.
Relationship between Magnitude and Phase (cf. Oppenheim, 1999)
Introduction to Spectral Estimation
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Chapter 2. Signals Husheng Li The University of Tennessee.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
(Lecture #08)1 Digital Signal Processing Lecture# 8 Chapter 5.
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
DIGITAL WATERMARKING SRINIVAS KHARSADA PATNAIK [1] AN ANALYSIS OF DIGITAL WATERMARKING IN FREQUENCY DOMAIN Presented by SRINIVAS KHARSADA PATNAIK ROLL.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1.Processing of reverberant speech for time delay estimation. Probleme: -> Getting the time Delay of a reverberant speech with severals microphone. ->Getting.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Fourier Analysis of Signals and Systems
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Sung-Won Yoon, David ChoiEE368C Project Proposal Bandwidth Extrapolation of Audio Signals Sung-Won Yoon David Choi February 8 th, 2001.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Applications of THE MODULATION SPECTRUM For Speech Engineering Hynek Hermansky IDIAP, Martigny, Switzerland Swiss Federal Institute of Technology, Lausanne,
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
Coherence spectrum (coherency squared)  = 0.1, 0.05, 0.01 X is the Fourier Transform Cross-spectral power density Confidence level: = running mean or.
Speech Enhancement Summer 2009
Digital and Non-Linear Control
PATTERN COMPARISON TECHNIQUES
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Signal Processing
Vocoders.
Spoken Digit Recognition
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
HKN ECE 310 Exam Review Session
Digital Systems: Hardware Organization and Design
Coherence spectrum (coherency squared)
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
A maximum likelihood estimation and training on the fly approach
Chapter 8 The Discrete Fourier Transform
Human Speech Communication
Chapter 8 The Discrete Fourier Transform
Learning Long-Term Temporal Features
Presented by Chen-Wei Liu
Speech Signal Representations
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
COPYRIGHT © All rights reserved by Sound acoustics Germany
Presentation transcript:

Frequency Domain Perceptual Linear Predicton (FDPLP) -with Marios Athineos, Dan Ellis, Sriram Ganapathy and Samuel Thomas cosine transfrorm frequency This FDPLP technique estimates models of temporal trajectories of spectral energies in frequency sub-bands. This is done by computing linear predictor on cosine transform of the signal instead of doing it on the signal itself as is in the case of the conventional linear prediction. Windowing the cosine transform of the signal at a given place is ensuring that the all-pole LP model is fitting only the signal in a frequency span given by the window width. The technique yields results that are of the same nature as MFCC coefficient that are usually used in speaker ID, so the direct replacement in most existing systems is possible. However, when the gains of the FDLP models are excluded, the technique is more robust to Decomposition into AM and FM components. Straightforward alleviation of effects of linear distortions and , reverberations . 1

Telephone speech Digit recognition accuracy [%] - ICSI Meeting Room Digit Corpus clean reverberated PLP 99.7 71.6 FDPLP 99.2 87.0 Improvements on real reverberations similar (IEEE Signal Proc.Letters 08) Reverberant speech Gain included Gain excluded Phoneme recognition accuracy [%] TIMIT HTIMIT PLP-MRASTA 67.6 47.8 FDPLP 68.1 53.5

FDLP decomposition of the signal AM component (temporal envelope) FM component (carrier)

Model without its gain component Reverberant speech (convolution with a long impulse response of the room) Convolution turns into addition in log spectral domain, as long as the most of the room impulse response fits into the analysis window! Ignoring FDLP model gain makes the representation invariant to revebs. 3 s window 30 s window Model without its gain component