Study of Word-Level Accent Classification and Gender Factors

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Fusion of HMM’s Likelihood and Viterbi Path for On-line Signature Verification Bao Ly Van - Sonia Garcia Salicetti - Bernadette Dorizzi Institut National.
A Text-Independent Speaker Recognition System
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Speaker Adaptation for Vowel Classification
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Optimal Adaptation for Statistical Classifiers Xiao Li.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Finding a single voice in music Christine Smit April 26, 2007.
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
Modeling speech signals and recognizing a speaker.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Implementing a Speech Recognition System on a GPU using CUDA
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Elise A. Piazza, Marius Cătălin Iordan, Casey Lew-Williams 
Automatic Speech Recognition: Conditional Random Fields for ASR
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Speaker Identification:
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Study of Word-Level Accent Classification and Gender Factors CSCE 666 Term Project Presentation Study of Word-Level Accent Classification and Gender Factors Xing Wang, Peihong Guo, Tian Lan, Guoyu Fu Dec 11th, 2013

Background Motivation: Accent Recognition(AR) helps improve Speech Recognition system and Speaker Identification system Our work Word-level Classifiers are built using different types of features, words and learning methods Speaker variation affect the AR system, gender factor is considered in this work.

Outline Feature Processing Classifier Experiments Word Alignment MFCC, Formants Classifier GMM, HMM Experiments Comparison of features, classifiers and words Gender effect

Data preparation Feature Processing Audios and corresponding phonetic transcriptions Biographical data for speakers Web crawler (Python) All metadata stored in Database Faster to locate and extract audio information

Alignment Feature Processing The Penn Phonetics Lab Forced Aligner is used to segment words apart. We are doing the word-based accent recognition system.

Feature Extraction Feature Processing Frame MFCCs Formants Window size, 25ms Window shift, 10ms MFCCs Formants

Classifiers GMM Number of components are determined via cross validation EM Algorithm to train HMM Observation: MFCC or Formants Hidden states: Single Gaussian component EM algorithm for training

Comparison of Features Experiments Comparison of Features Experiments Process 5-fold cross validation Training and test Repeat 5 times on random samples HMM is fixed as the classifier Features for comparison MFCCs (c0~c12) F0F1F2 F1F2

Comparison of Classifiers Experiments Comparison of Classifiers MFCC is fixed as the approach to extract features Classifiers for comparison Non-temporal: GMM Temporal: HMM

Comparison of Words Experiments Factors of classification accuracies of different words Certain vowels and consonants C1-C2 trajectory of word ‘OF’

Experiments Gender Effect Gender classification

Thanks! Q&A