Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.

Slides:

Advertisements

Similar presentations

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Advertisements

Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.

Speech Recognition with Hidden Markov Models Winter 2011

Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

The loss function, the normal equation,

Forgetron Slide 1 Online Learning with a Memory Harness using the Forgetron Shai Shalev-Shwartz joint work with Ofer Dekel and Yoram Singer Large Scale.

On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.

Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Speaker Adaptation for Vowel Classification

Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Dynamic Time Warping Applications and Derivation

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

A PRESENTATION BY SHAMALEE DESHPANDE

Online Learning Algorithms

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.

Speech Signal Processing

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.

Ch 5b: Discriminative Training (temporal model) Ilkka Aho.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Performance Comparison of Speaker and Emotion Recognition

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.

Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

A NONPARAMETRIC BAYESIAN APPROACH FOR

An Efficient Online Algorithm for Hierarchical Phoneme Classification

Online Multiscale Dynamic Topic Models

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

Speech Recognition UNIT -5.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Statistical Models for Automatic Speech Recognition

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

CRANDEM: Conditional Random Fields for ASR

Statistical Models for Automatic Speech Recognition

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Presentation transcript:

Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph Keshet, Hebrew University Yoram Singer, Google Dan Chazan, IBM

Phoneme Alignment. Slide 2 The Alignment Problem Phonetic transcription: Waveform: Text: /hh ae v ey tcl t eh s tcl t/ Have a test

Phoneme Alignment. Slide 3 The Alignment Problem Setting alignment function acoustic representation phonetic representation start-time of phoneme p i in x /hh ae v ey tcl t eh s tcl t/

Phoneme Alignment. Slide 4 Acoustic Representation Short-time Fourier Transform

Phoneme Alignment. Slide 5 Comparing Alignments e.g.

Phoneme Alignment. Slide 6  insensitive Cost  insensitivity region

Phoneme Alignment. Slide 7 A Discriminative Learning Approach Training set: Learning Algorithm Hypotheses class Alignment function:

Phoneme Alignment. Slide 8 Outline of Solution 1.Define the hypotheses class - constitutes the template of our alignment function: a.Map each possible alignment into vectors in an abstract vector-space b.Devise a projection in the vector-space which order alignments in accordance to their quality 2.Derive a simple online learning algorithm 3.Convert the Online Alg. to a Batch procedure with some formal guarantees

Phoneme Alignment. Slide 9 Feature “Primitives” for Alignment feature primitive for alignment Assessing the quality of a suggested alignment acoustic and phonetic representation suggested alignment

Phoneme Alignment. Slide 10 Feature Primitive I Cumulative spectral change across the boundaries

Phoneme Alignment. Slide 11 Feature Primitives I Cumulative spectral change across the boundaries

Phoneme Alignment. Slide 12 Feature Primitives II Cumulative confidence in the phoneme sequence frame based phoneme classifier is the confidence that phoneme was uttered at frame (Dekel, Keshet, Singer, ‘04) Learn a static frame-based phoneme classifier

Phoneme Alignment. Slide 13 Feature Primitive III Phoneme duration model - average length of phoneme - standard deviation of the length of phoneme

Phoneme Alignment. Slide 14 Feature Primitive IV Spectogram at different rates of articulation (Pickett, 1980) Speaking-rate (“dynamics”)

Phoneme Alignment. Slide 15 Feature Functions for Alignment correct alignment slightly incorrect alignment grossly incorrect alignment Mapping all possible alignments into a vector space

Phoneme Alignment. Slide 16 Main Solution Principle grossly incorrect alignment correct alignment slightly incorrect alignment Find a linear projection that ranks alignments according to their quality

Phoneme Alignment. Slide 17 slightly incorrect alignment Main Solution Principle (cont.) example of low confidence projection correct alignment grossly incorrect alignment

Phoneme Alignment. Slide 18 slightly incorrect alignment Main Solution Principle (cont.) example of incorrect projection correct alignment grossly incorrect alignment

Phoneme Alignment. Slide 19 Online Learning Algorithm Hypotheses class Cumulative cost

Phoneme Alignment. Slide 20 Online Learning For Receive an instance Predict Receive true alignment and Pay cost If Set Update

Phoneme Alignment. Slide 21 Converting from Online to Batch Run online algorithm on the training set and generate w 1,…,w M Small online error  exists w 2 {w 1,…,w M } whose generalization error is low (Cesa-bianchi et al.) Choose w 2 {w 1,…,w M } which minimizes the error on a fresh validation set

Phoneme Alignment. Slide 22 Algorithmic aspects Running-time: If the “inference”,, can be performed in polynomial time (e.g. dynamic programming), then the entire algorithm operates in polynomial time as well. Worst case analysis for Online Learning: For any competitor u, Generalization error Online-to-batch conversion guarantees that: low online error  low generalization error

Phoneme Alignment. Slide 23 Experiments TIMIT corpus Phoneme representation: 48 phonemes (Lee & Hon, 1989) Acoustic Representation: MFCC+∆+∆∆ (ETSI standard) TIMIT training set: 500 utterances for training a frame classifier 3096 utterances for learning alignment function 100 utterances used for validation

Phoneme Alignment. Slide 24 Alternative Approaches Brugnara, Falavigna & Omologo, Automatic segmentation and labeling of speech based on HMM, Hosom, Automatic phoneme alignment on acoustic-phonetic modeling, Toledano, Gomez & Grande, Automatic Phoneme Alignment, 2003.

Phoneme Alignment. Slide 25 Results Brugnara, Falavigna and Omologo, “Automatic segmentation and labling of speech based on Hidden Markov Models”, Speech Comm., 12 (1993) Training size Test set t < 10 ms t < 20 ms t < 30 mst < 40 ms Discrim. Algo. 650 or core79.7%92.1%96.2%98.1% Brugnara et al core75.3%88.9%94.4%97.1% Discrim. Algo. 650 or entire80.0%92.3%96.4%98.2% Brugnara et al entire74.6%88.8%94.1%96.8%

Phoneme Alignment. Slide 26 Current and Future Work Discriminative learning methods for: Whole phoneme sequence classification 64% (ours) vs. 59% (HMM – IDIAP Torch3) Results without normalization of silences etc. Small vocabulary continuous speech recognition Segmentation of utterances to speakers Full online learning setting: real-time adaptation to Speaker/environment changes