Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Similar presentations


Presentation on theme: "Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science."— Presentation transcript:

1 Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

2 What is Phoneme? Phonemes are very small units of intelligible sound (usually less than 200 ms). Phonetic spelling is the sequence of phonemes that a word comprises. Example:  Coat ([ kōt] /K OW T/)  From ([ frəm] /F R AH M/)  impressive ([ imˈpresiv ] /IH M P R EH S IH V/) 2

3 Phoneme Classification What is phoneme classification?  Input: A short segment of audio signal.  Output: What phoneme it is. Phoneme classification is a complex task:  More than 100 classes (based on International Phonetic Alphabet)  Variation in speakers, dialects, accents, noise in the environment, etc. Phoneme classification can be used in:  Robust speech recognition  Accent/dialect detection  Speech quality scoring 3

4 Related Work Different methods for phoneme classification have been used in the literature:  Hidden Markov model [Lee, 1989]  Neural network [Schwarz, 2009]  Deep belief network [Mohamed, 2012]  Support vector machine [Salomon, 2001]  Hierarchical methods [Dekel, 2005]  Boltzmann machine [Mohamed, 2010] Although data mining society has shown that k-NN classifiers can work well on time series data, it hasn’t been tried on phoneme yet. 4 [C. Lopes, F. Perdigao, 2011]

5 Our Dual-domain Approach 5 Time Domain: Using k-NN Dynamic Time Warping (DTW) Expensive Speed up by lower bounding techniques Frequency Domain: Using k-NN Euclidean distance between Mel- frequency cepstrum coefficients (MFCC) Fast

6 Real Example 6

7 Challenge 7 DTW is expensive (quadratic in time and space complexity) We need to apply a speed up technique  Solution: Lower bounding techniques w w

8 DTW Lower bounding 8 Resampling to equal length doesn’t always work !!!

9 DTW Lower bounding 9  We use the prefix of the longer signal (Prefixed LB_Keogh)  We show that Prefixed LB_Keogh is a lower bound if: w > difference between lengths of two signals  We set w = c * length of the longer signal  We ignore all pairs of signals that don’t satisfy the above condition. 24681012141618 x10 4 0 0.5 1 1.5 2 2.5 3 3.5 Speedup Training Set Size 102030405060708090100 80.2 80.4 80.6 80.8 81 81.2 81.4 81.6 81.8 Window Size (c%) Accuracy(%) c = 30%

10 Data Collection 10  370,000 phonemes are segmented from:  Data is publicly available.

11 Phoneme Segmentation 11 The Penn Phonetics Lab Forced Aligner (p2fa)  Takes a signal and a transcript  Produces timing segmentations (word level and phoneme level)

12 Accuracy (All layers) 12  10-fold cross validation  100 random phonemes in each fold

13 Accented Phoneme Classification 13 00.511.522.533.5 x 10 4 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Training Set Size Accuracy MFCC DTW  British vs. American accent  Using Oxford test set  2-class classification problem  No hierarchy

14 Conclusion  We present a dual-domain hierarchical method for phoneme classification.  We generate a novel dataset of 370,000 phonemes.  We achieve up to 73% accuracy rate for 39 classes.  Our lower bounding technique gives us up to 3X speedup. 14

15 15 Thank You Data and code available at: http://cs.unm.edu/~hamooni/papers/Dual_2014


Download ppt "Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science."

Similar presentations


Ads by Google