Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Online Algorithm for Hierarchical Phoneme Classification

Similar presentations


Presentation on theme: "An Efficient Online Algorithm for Hierarchical Phoneme Classification"— Presentation transcript:

1 An Efficient Online Algorithm for Hierarchical Phoneme Classification
Joseph Keshet joint work with Ofer Dekel and Yoram Singer The Hebrew University, Israel MLMI ‘04 Martigny, Switzerland

2 Motivation Phonetic transcription of DECEMBER Gross errors
d ix CH eh m bcl b er Phoneme recognition is the task of assigning phoneme to speech frames. Typical phoneme recognizers make different types of errors: gross errors and minor errors. By gross errors we mean predicting a phoneme that is acoustically far from the true phoneme, e.g., … We, however, prefer minor errors, that is, we tolerate the prediction of a phoneme that is acoustically close to the true phoneme. E.g., … We ever prefer to predict the phoneme group, rather to make a gross error, e.g. … And we believe that smart language model should handle those minor errors. Minor errors d AE s eh m bcl b er d ix s eh NASAL bcl b er Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

3 Hierarchical Classification
Goal: spoken phoneme recognition PHONEMES Sononorants Silences Nasals Obstruents Liquids n m Vowels l ng y w r Affricates The phonetic theory of spoken speech embeds the set of phonemes of western languages in a hierarchy, in which the phonemes are the leaves of the tree and the phoneme groups are the internal vertices. E.g., … The topic of this work is the algorithmic design and implementation of hierarchical phoneme recognition, in which we tolerate minor errors but avoid gross errors. Plosives jh ch Fricatives Front Center Back b f oy aa iy g v ow ao ih d sh uh er ey k s th uw aw eh p ay ae t dh zh z Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

4 Metric Over Phonetic Tree
A given hierarchy induces a metric over the set of phonemes  tree distance For any pair of phonemes or phoneme groups we associate an integer number which is the tree distance between them. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

5 Metric Over Phonetic Tree
A given hierarchy induces a metric over the set of phonemes  tree distance b a For example suppose that /a/ is a phoneme and /b/ is a phoneme group, then the distance between then is 4, since there are 4 edges between them. We denote this distance by gamma. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

6 Metric Over Phonemes Metric semantics: γ(a,b) is the severity of predicting phoneme group “b” instead of correct phoneme “a” b a Our high-level goal: Tolerate minor errors … Sibling errors Under-confident predictions - predicting a parent …but, avoid major errors Gamma is the notion of severity of predicting phoneme group /b/ instead of the phoneme /a/. This notion comes along with our goal … Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

7 Hierarchical Classifier
Assume and Associate a prototype with each phoneme Score of phoneme as Classification rule: W4 W5 W6 W7 W8 W9 W10 W1 W0 W2 W3 K is the number of phonemes and the number of classes. Here equals 10. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

8 Hierarchical Classifier
Goal: maintain “close” to Define Goal: maintain small w4 w5 w6 w7 w8 w9 w10 w1 w0 w2 w3 Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

9 Online Learning For Receive an acoustic vector Predict a phoneme
Receive correct phoneme Suffer tree-based penalty Apply update rule to obtain We would like to leans the values of the set of {w}’s The online learning takes place in rounds. In each round… Our goal in the process is to suffer small cumulative tree-error In order to achieve this goal we only need to determine the update rule that minimizes the cumulative tree error. Goal: Suffer a small cumulative tree error Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

10 Tree Loss Difficult to minimize directly
Instead upper bound by where also known as the hinge loss The cumulative tree error is a combinatorial quantity and thus it is difficult to minimize directly. Instead we upper bound the tree error by the tree loss, denoted here by \ell We call \ell the loss and cumulative loss upper-bound the cumulative tree error We minimize the cumulative tree-loss. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

11 Online Update w0 w1 w2 w3 w4 w5 w6 w7 w8 The update is a result of solving quadratic optimization problem on each round. Although the update seem odd and complicated it is very reasonable. Local update – only nodes along the path from to are updated w9 w10 Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

12 Loss Bound Theorem sequence of examples satisfies Then where and
We shortly state a bound on the tree loss… Moreover the cumulative tree loss is an upper bound to the cumulative tree error. This means that no matter how many example we have we will only make a fixed number of mistakes. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

13 Extension: Kernels Since Note that Therefore
An important and natural extension to the algorithm are the kernel functions we enable us to work non-linearly. Since the update I merely an addition or a subtraction of a scaled version of the acoustic vector, we can write the final w as a linear comb… We call x_i support vector or support patterns Plug it into the classification rule, we see that the classification rule is a set of inner products between … The kernel function perform the inner product in a high dimensional space and therefore enable us to work non-linearly Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

14 Experiments Synthetic data: Phoneme recognition:
Symmetric tree of depth 4, fan out 3, 121 labels Prototypes: orthogonal set in with Gaussian noise 100 train instances and 50 test instances per label Phoneme recognition: Subset of the TIMIT corpus 55 phonemes and phoneme groups MFCC+∆+∆∆ front-end, concatenation of 5 frames RBF kernel 2000 train vectors and 500 test vector per phoneme We turn now to the experiments. We conducted experiments on synthetic data. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

15 Experiments Multiclass - Ignore the hierarchy
Greedy approach: solve a multiclass problem at nodes with at least 2 children C C We compared our algorithm to 2 naïve approaches. The greedy approach does tie between problems. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

16 Results Averaged Tree Error Multiclass Error Synthetic data (tree)
Averaged Tree Error Multiclass Error Synthetic data (tree) 0.05 5 Synthetic data (multiclass) 0.11 8.6 Synthetic data (greedy) 0.52 34.9 Phonemes (tree) 1.3 40.6 Phonemes (multiclass) 1.41 41.8 Phonemes (greedy) 2.48 58.2 Here we present the results. The table compare the results of the 2 data set we used. We compare both the avg. tree error and the MC error. Note the our algorithm outperform all other algorithms Note the phoneme recognition error is relatively high since we didn’t use all the TIMIT copus. This work is still in progress. Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

17 Results Difference between the tree error rates of the tree algorithm and the multiclass (MC) algorithm gross errors Tree err-MC err Tree err-MC err Negative is good This was our motivation in the first plcae. Trade-off with simple multiclass minor errors Synthetic data Phonemes Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

18 Tree vs. Multiclass Online Learning
Similarity between the prototypes in Multiclass and Tree training The difference between the online learning of the multiclass algo and the tree algo. There is a red edge between 2 vertices if they are close to each other, namely, if the distance between is less than some thershold. The multiclass fail to do so Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University

19 Thanks! Large Margin Hierarchical Classification Joseph Keshet, The Hebrew University


Download ppt "An Efficient Online Algorithm for Hierarchical Phoneme Classification"

Similar presentations


Ads by Google