An Efficient Online Algorithm for Hierarchical Phoneme Classification

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Linear Classifiers (perceptrons)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Support Vector Machines
Machine learning continued Image source:
Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.
Boosting Approach to ML
Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Learning Algorithms
An Introduction to Support Vector Machines Martin Law.
Speech Signal Processing
Outline Separating Hyperplanes – Separable Case
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta.
Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
Quantitative and qualitative differences in understanding sentences interrupted with noise by young normal-hearing and elderly hearing-impaired listeners.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
An Introduction to Support Vector Machines (M. Law)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
ICS 353: Design and Analysis of Algorithms Backtracking King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
A NONPARAMETRIC BAYESIAN APPROACH FOR
Support vector machines
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Large Margin classifiers
Dan Roth Department of Computer and Information Science
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Statistical Models for Automatic Speech Recognition
Support Vector Machines
Chapter 5. Optimal Matchings
CS 4/527: Artificial Intelligence
Structure of Spoken Language
Learning with information of features
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
Statistical Models for Automatic Speech Recognition
Presented by: Chang Jia As for: Pattern Recognition
Support vector machines
ICS 353: Design and Analysis of Algorithms
Error Correction Coding
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Phonetics and Phonemics
Support Vector Machines 2
Presentation transcript:

An Efficient Online Algorithm for Hierarchical Phoneme Classification Joseph Keshet joint work with Ofer Dekel and Yoram Singer The Hebrew University, Israel MLMI ‘04 Martigny, Switzerland

Motivation Phonetic transcription of DECEMBER Gross errors d ix CH eh m bcl b er Phoneme recognition is the task of assigning phoneme to speech frames. Typical phoneme recognizers make different types of errors: gross errors and minor errors. By gross errors we mean predicting a phoneme that is acoustically far from the true phoneme, e.g., … We, however, prefer minor errors, that is, we tolerate the prediction of a phoneme that is acoustically close to the true phoneme. E.g., … We ever prefer to predict the phoneme group, rather to make a gross error, e.g. … And we believe that smart language model should handle those minor errors. Minor errors d AE s eh m bcl b er d ix s eh NASAL bcl b er Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Hierarchical Classification Goal: spoken phoneme recognition PHONEMES Sononorants Silences Nasals Obstruents Liquids n m Vowels l ng y w r Affricates The phonetic theory of spoken speech embeds the set of phonemes of western languages in a hierarchy, in which the phonemes are the leaves of the tree and the phoneme groups are the internal vertices. E.g., … The topic of this work is the algorithmic design and implementation of hierarchical phoneme recognition, in which we tolerate minor errors but avoid gross errors. Plosives jh ch Fricatives Front Center Back b f oy aa iy g v ow ao ih d sh uh er ey k s th uw aw eh p ay ae t dh zh z Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Metric Over Phonetic Tree A given hierarchy induces a metric over the set of phonemes  tree distance For any pair of phonemes or phoneme groups we associate an integer number which is the tree distance between them. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Metric Over Phonetic Tree A given hierarchy induces a metric over the set of phonemes  tree distance b a For example suppose that /a/ is a phoneme and /b/ is a phoneme group, then the distance between then is 4, since there are 4 edges between them. We denote this distance by gamma. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Metric Over Phonemes Metric semantics: γ(a,b) is the severity of predicting phoneme group “b” instead of correct phoneme “a” b a Our high-level goal: Tolerate minor errors … Sibling errors Under-confident predictions - predicting a parent …but, avoid major errors Gamma is the notion of severity of predicting phoneme group /b/ instead of the phoneme /a/. This notion comes along with our goal … Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Hierarchical Classifier Assume and Associate a prototype with each phoneme Score of phoneme as Classification rule: W4 W5 W6 W7 W8 W9 W10 W1 W0 W2 W3 K is the number of phonemes and the number of classes. Here equals 10. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Hierarchical Classifier Goal: maintain “close” to Define Goal: maintain small w4 w5 w6 w7 w8 w9 w10 w1 w0 w2 w3 Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Online Learning For Receive an acoustic vector Predict a phoneme Receive correct phoneme Suffer tree-based penalty Apply update rule to obtain We would like to leans the values of the set of {w}’s The online learning takes place in rounds. In each round… Our goal in the process is to suffer small cumulative tree-error In order to achieve this goal we only need to determine the update rule that minimizes the cumulative tree error. Goal: Suffer a small cumulative tree error Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Tree Loss Difficult to minimize directly Instead upper bound by where also known as the hinge loss The cumulative tree error is a combinatorial quantity and thus it is difficult to minimize directly. Instead we upper bound the tree error by the tree loss, denoted here by \ell We call \ell the loss and cumulative loss upper-bound the cumulative tree error We minimize the cumulative tree-loss. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Online Update w0 w1 w2 w3 w4 w5 w6 w7 w8 The update is a result of solving quadratic optimization problem on each round. … Although the update seem odd and complicated it is very reasonable. Local update – only nodes along the path from to are updated w9 w10 Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Loss Bound Theorem sequence of examples satisfies Then where and We shortly state a bound on the tree loss… Moreover the cumulative tree loss is an upper bound to the cumulative tree error. This means that no matter how many example we have we will only make a fixed number of mistakes. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Extension: Kernels Since Note that Therefore An important and natural extension to the algorithm are the kernel functions we enable us to work non-linearly. Since the update I merely an addition or a subtraction of a scaled version of the acoustic vector, we can write the final w as a linear comb… We call x_i support vector or support patterns Plug it into the classification rule, we see that the classification rule is a set of inner products between … The kernel function perform the inner product in a high dimensional space and therefore enable us to work non-linearly Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Experiments Synthetic data: Phoneme recognition: Symmetric tree of depth 4, fan out 3, 121 labels Prototypes: orthogonal set in with Gaussian noise 100 train instances and 50 test instances per label Phoneme recognition: Subset of the TIMIT corpus 55 phonemes and phoneme groups MFCC+∆+∆∆ front-end, concatenation of 5 frames RBF kernel 2000 train vectors and 500 test vector per phoneme We turn now to the experiments. We conducted experiments on synthetic data. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Experiments Multiclass - Ignore the hierarchy Greedy approach: solve a multiclass problem at nodes with at least 2 children C C We compared our algorithm to 2 naïve approaches. The greedy approach does tie between problems. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Results Averaged Tree Error Multiclass Error Synthetic data (tree)   Averaged Tree Error Multiclass Error Synthetic data (tree) 0.05 5 Synthetic data (multiclass) 0.11 8.6 Synthetic data (greedy) 0.52 34.9 Phonemes (tree) 1.3 40.6 Phonemes (multiclass) 1.41 41.8 Phonemes (greedy) 2.48 58.2 Here we present the results. The table compare the results of the 2 data set we used. We compare both the avg. tree error and the MC error. Note the our algorithm outperform all other algorithms Note the phoneme recognition error is relatively high since we didn’t use all the TIMIT copus. This work is still in progress. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Results Difference between the tree error rates of the tree algorithm and the multiclass (MC) algorithm gross errors Tree err-MC err Tree err-MC err Negative is good This was our motivation in the first plcae. Trade-off with simple multiclass minor errors Synthetic data Phonemes Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Tree vs. Multiclass Online Learning Similarity between the prototypes in Multiclass and Tree training The difference between the online learning of the multiclass algo and the tree algo. There is a red edge between 2 vertices if they are close to each other, namely, if the distance between is less than some thershold. The multiclass fail to do so Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

Thanks! Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University