Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Advanced Speech Enhancement in Noisy Environments
Multipitch Tracking for Noisy Speech
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Advances in WP2 Chania Meeting – May
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Uses of the pitch-scaled harmonic filter in speech processing by Philip Jackson * and Christine Shadle † *School of Electronic and Electrical Engineering,
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
A PRESENTATION BY SHAMALEE DESHPANDE
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
Representing Acoustic Information
Introduction to Automatic Speech Recognition
Eng. Shady Yehia El-Mashad
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,
High Quality Voice Morphing
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Spoken Digit Recognition
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Structure-Based Speech Classification Using State-Space Embedding
Analysis of Audio Using PCA
Digital Systems: Hardware Organization and Design
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
A maximum likelihood estimation and training on the fly approach
Learning Long-Term Temporal Features
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution periodic contribution Production of /z/:

Motivation & Aims Most speech sounds are predominantly voiced or unvoiced. What happens when the two components are “mixed”? Voiced and unvoiced components have different natures: unvoiced: aperiodic signal from turbulence-noise sources voiced: quasi-periodic signal from vocal-fold vibration Why not extract their features separately? Do the two contributions contain complementary information? Human speech recognition still performs well in noise. How? Does it take advantage of harmonic properties? Introduction

Voiced and unvoiced parts of a speech signal aperiodic contribution periodic contribution Production of /z/: Introduction

Automatic Speech Recognition Front End Pattern Recognition speech signal speech labels Feature Extraction: conversion of speech signals to a sequence of parameter vectors Dynamic Programming: matching of observation sequences to models of known utterances Introduction

u(n)v(n) Harmonic Decomposition Pitch optimisation PSHF block diagram raw pitch wave- form + _ optimised pitch f 0 raw f 0 opt aperiodic waveform s(n) periodic waveform N opt s w (n) v w (n) ^ window w(n) window u w (n) ^ PSHF

Decomposition example (waveforms) Original Periodic part Aperiodic part PSHF

Decomposition example (spectrograms) Original Periodic part Aperiodic part PSHF

Decomposition example (MFCC specs.) Original Periodic part Aperiodic part PSHF

Parameterisations SPLIT: MFCC+Δ, +Δ 2 catPSHF PCA26: PCA78: PCA13: PCA39: MFCC +Δ, +Δ 2 cat PSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA BASE: MFCC waveformfeatures +Δ, +Δ 2 Method

Speech Database: Aurora 2.0 TIdigits database at 8 kHz, filtered with G.712 channel Connected English digit strings (male & female speakers) Method

Description of the experiments Baseline experiment: [base] standard parameterisation of the original waveforms (i.e., MFCC+D+A) Split experiments: [split] adjustment of stream weights (voiced vs. unvoiced) PCA experiments: [pca26, pca78, pca13 and pca39] decorrelation of the feature vectors, and reduction of the number of coefficients Method

Split experiments results Results

Split experiments results Results

Split experiments results Results

Summary of results Results

Conclusions PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic). Used separately, accuracy was slighty degraded, however together, it was substantially increased in noisy conditions. Periodic speech segments provide robustness to noise. Apply Linear Discriminant Analysis (LDA) to the two- stream feature vector. Evaluate the performance of this front end in a more general task, such as phoneme recognition. Test the technique for speaker recognition. Further Work

COLUMBO PROJECT: Harmonic Decomposition applied to ASR David M. Moreno 1 Philip J.B. Jackson 2 Javier Hernando 1 Martin J. Russell 3 Personal/P.Jackson/Columbo/ 123

Pitch Optimisation: vowel /u/ Cost function Spectrum derived from a 268-point DFT

Harmonic Decomposition: vowel /u/

Word accuracy results (%)

Observation probability, with stream weights