Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Yasuhiro Fujiwara (NTT Cyber Space Labs)

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

POSTER TEMPLATE BY: Multi-Sensor Health Diagnosis Using Deep Belief Network Based State Classification Prasanna Tamilselvan.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.

Abdullah Mueen UC Riverside Suman Nath Microsoft Research Jie Liu Microsoft Research.

Deep Belief Networks for Spam Filtering

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

A PRESENTATION BY SHAMALEE DESHPANDE

Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Eng. Shady Yehia El-Mashad

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Study of Word-Level Accent Classification and Gender Factors

Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.

7-Speech Recognition Speech Recognition Concepts

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Exact indexing of Dynamic Time Warping

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Performance Comparison of Speaker and Emotion Recognition

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.

Abdullah Mueen 5 Slides Demo. Primitives for Time Series Data Mining ▪Time series motifs ▪Time series shapelets ▪Time series join 3/27/19962/24/19981/25/200012/25/200111/25/200310/25/20059/25/20078/25/20097/26/20116/26/2013.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

1 LING 696B: Final thoughts on nonparametric methods, Overview of speech processing.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

A NONPARAMETRIC BAYESIAN APPROACH FOR

Recognition of bumblebee species by their buzzing sound

Applying Deep Neural Network to Enhance EMPI Searching

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

ARTIFICIAL NEURAL NETWORKS

Spoken Digit Recognition

Artificial Intelligence for Speech Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Supervised Time Series Pattern Discovery through Local Importance

Enumeration of Time Series Motifs of All Lengths

Audio Books for Phonetics Research

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

John H.L. Hansen & Taufiq Al Babba Hasan

Time Series Filtering Time Series

Topological Signatures For Fast Mobility Analysis

Automatic Handwriting Generation

Keyword Spotting Dynamic Time Warping

Presentation transcript:

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science

What is Phoneme? Phonemes are very small units of intelligible sound (usually less than 200 ms). Phonetic spelling is the sequence of phonemes that a word comprises. Example:  Coat ([ kōt] /K OW T/)  From ([ frəm] /F R AH M/)  impressive ([ imˈpresiv ] /IH M P R EH S IH V/) 2

Phoneme Classification What is phoneme classification?  Input: A short segment of audio signal.  Output: What phoneme it is. Phoneme classification is a complex task:  More than 100 classes (based on International Phonetic Alphabet)  Variation in speakers, dialects, accents, noise in the environment, etc. Phoneme classification can be used in:  Robust speech recognition  Accent/dialect detection  Speech quality scoring 3

Related Work Different methods for phoneme classification have been used in the literature:  Hidden Markov model [Lee, 1989]  Neural network [Schwarz, 2009]  Deep belief network [Mohamed, 2012]  Support vector machine [Salomon, 2001]  Hierarchical methods [Dekel, 2005]  Boltzmann machine [Mohamed, 2010] Although data mining society has shown that k-NN classifiers can work well on time series data, it hasn’t been tried on phoneme yet. 4 [C. Lopes, F. Perdigao, 2011]

Our Dual-domain Approach 5 Time Domain: Using k-NN Dynamic Time Warping (DTW) Expensive Speed up by lower bounding techniques Frequency Domain: Using k-NN Euclidean distance between Mel- frequency cepstrum coefficients (MFCC) Fast

Real Example 6

Challenge 7 DTW is expensive (quadratic in time and space complexity) We need to apply a speed up technique  Solution: Lower bounding techniques w w

DTW Lower bounding 8 Resampling to equal length doesn’t always work !!!

DTW Lower bounding 9  We use the prefix of the longer signal (Prefixed LB_Keogh)  We show that Prefixed LB_Keogh is a lower bound if: w > difference between lengths of two signals  We set w = c * length of the longer signal  We ignore all pairs of signals that don’t satisfy the above condition x Speedup Training Set Size Window Size (c%) Accuracy(%) c = 30%

Data Collection 10  370,000 phonemes are segmented from:  Data is publicly available.

Phoneme Segmentation 11 The Penn Phonetics Lab Forced Aligner (p2fa)  Takes a signal and a transcript  Produces timing segmentations (word level and phoneme level)

Accuracy (All layers) 12  10-fold cross validation  100 random phonemes in each fold

Accented Phoneme Classification x Training Set Size Accuracy MFCC DTW  British vs. American accent  Using Oxford test set  2-class classification problem  No hierarchy

Conclusion  We present a dual-domain hierarchical method for phoneme classification.  We generate a novel dataset of 370,000 phonemes.  We achieve up to 73% accuracy rate for 39 classes.  Our lower bounding technique gives us up to 3X speedup. 14

15 Thank You Data and code available at: