Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Building an ASR using HTK CS4706
Basic Spectrogram & Clinical Application: Consonants
Speech Recognition with Hidden Markov Models Winter 2011
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech Group INRIA Lorraine
Speaker Adaptation for Vowel Classification
Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate.
Speech Recognition in Noise
Optimal Adaptation for Statistical Classifiers Xiao Li.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction to Automatic Speech Recognition
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Segmental HMMs: Modelling Dynamics and Underlying Structure for Automatic Speech Recognition Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
A Recognition Model for Speech Coding Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Spontaneous speech recognition using a statistical model of VTR-dynamics Team members: L.Deng (co-tech.team leader), J.Ma, M.Schuster, J.Bridle (co-tech.team.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Acoustic Phonetics 3/14/00.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
8-Speech Recognition Speech Recognition Concepts
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Speaker Identification:
Qiang Huo(*) and Chorkin Chan(**)
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear representations

Abstract INTRODUCTION

Speech dynamics into ASR dynamics of speech production to constrain recognizer –noisy environments –conversational speech –speaker adaptation efficient, complete and trainable models –for recognition –for analysis –for synthesis INTRODUCTION

Articulatory trajectories from West (2000) INTRODUCTION

Articulatory-trajectory model INTRODUCTION

intermediate finite-state surface Level source dependent Articulatory-trajectory model INTRODUCTION

Multi-level Segmental HMM segmental finite-state process intermediate “articulatory” layer –linear trajectories mapping required –linear transformation –radial basis function network INTRODUCTION

Linear-trajectory model INTRODUCTION acoustic layer articulatory-to- acoustic mapping intermediate layer segmental HMM 23451

Linear-trajectory equations Defined as where Segment probability: THEORY

Linear mapping Objective function with matched sequences and THEORY

Trajectory parameters Utterance probability, and, for the optimal (ML) state sequence THEORY

Non-linear (RBF) mapping... acoustic layer formant trajectories THEORY

Trajectory parameters With the RBF, the least-squares solution is sought by gradient descent: THEORY

Tests on TIMIT N. American English, at 8kHz –MFCC13 acoustic features (incl. zero’ th ) a)F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker b)F1-3+BE5: five band energies added c)PFS12: synthesiser control parameters METHOD

TIMIT baseline performance Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1) RESULTS

Performance across feature sets RESULTS

Phone categorisation No.No.Description A 1all data B 2silence; speech C 6linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate D 10as Deng and Ma (2000): silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop E 10discrete articulatory regions F 49silence; individual phones METHOD

Discrete articulatory regions FeaturesDescription 0 -voiceSilence, non-speech 1 +voice, VT openVowel, glide 2 +voice, VT part.Liquid, approximant 3 +voice, VT closed, +velumNasal 4 +voice, VT closedVoiced plosive (closure) 5 -voice, VT closedVoiceless plosive (closure) 6 +voice, VT open, +plosionVoiced plosive (release) 7 -voice, VT open, +plosionVoiceless plosive (release) 8 +voice, VT part., +fric/aspVoiced fricative 9 -voice, VT part., +fric/aspVoiceless fricative METHOD

Performance across groupings RESULTS

Results across groupings RESULTS

Tests on MOCHA S. British English, at 16kHz –MFCC13 acoustic features (incl. zero’ th ) –articulatory x - & y -coords from 7 EMA coils –PCA9+Lx: first nine articulatory modes plus the laryngograph log energy METHOD

MOCHA baseline performance RESULTS

Performance across mappings RESULTS

Model visualisation DISCUSSION Original acoustic data Constant- trajectory model Linear- trajectory model, (F) PFS12 (c)

Conclusions Theory of Multi-level Segmental HMMs Benefits of linear trajectories Results show near optimal performance with linear mappings Progress towards unified models of the speech production process What next? –unsupervised (embedded) training, to derive pseudo-articulatory representations –implement non-linear mapping (i.e., RBF) –include biphone language model, and segment duration models SUMMARY