Korean Phoneme Discrimination

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Lecture 14 – Neural Networks
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Text Independent Speaker Recognition with Added Noise Jason Cardillo & Raihan Ali Bashir April 11, 2005.
Speaker Adaptation for Vowel Classification
Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.
Optimal Adaptation for Statistical Classifiers Xiao Li.
A PRESENTATION BY SHAMALEE DESHPANDE
Speech Recognition Deep Learning and Neural Nets Spring 2015.
West Virginia University
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
Eng. Shady Yehia El-Mashad
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Study of Word-Level Accent Classification and Gender Factors
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Image recognition using analysis of the frequency domain features 1.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Explorations in Neural Networks Tianhui Cai Period 3.
Chapter 9 Neural Network.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
Implementing a Speech Recognition System on a GPU using CUDA
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
NUKUS STATE PEDAGOGICAL INSTITUTE named after AJINIYAZ
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
The Sounds of English: an Introduction to English Phonetics.
CS 445/656 Computer & New Media
Recognition of bumblebee species by their buzzing sound
Deep Learning Amin Sobhani.
Randomness in Neural Networks
an Introduction to English
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Customer Satisfaction Based on Voice
2.2.2 Complementary distribution
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Bird-species Recognition Using Convolutional Neural Network
Musical Style Classification
Example: Voice Recognition
Artificial Neural Networks
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Audio and Speech Computers & New Media.
Digital Systems: Hardware Organization and Design
John H.L. Hansen & Taufiq Al Babba Hasan
Advances in Deep Audio and Audio-Visual Processing
Phoneme Recognition Using Neural Networks by Albert VanderMeulen
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.
Presentation transcript:

Korean Phoneme Discrimination Ben Lickly

Motivation Certain Korean phonemes are very difficult for English speakers to distinguish: ㅅ(IPA: s) ㅆ(IPA: s͈ )

Cepstral Analysis Need to modify sounds into a format meaningful to the network Mel Frequency Cepstral Coefficients (MFCC) are a popular method of feature extraction. MFCC take a discrete Fourier transform based on a modified scale. The mel scale

Publication of Interest Recurrent Neural Networks for Phoneme Recognition Takuya Koizumi, Mikio Mori, Shuji Taniguchi, and Mitsutoshi Maruya Dept. of Information Science, Fukui University, Japan Applied recurrent neural networks to classify phonemes from a Japanese word database

Overview of recurrent neural networks In contrast with feed-forward networks, recurrent neural networks can have cycles. This means that the input can be split up among multiple time steps. In this publication, two types of recurrent neural networks were studied.

Type 1 RNN

Type 2 RNN

Benefits of recurrent neural networks “[F]eedforward multi-layer neural networks are inherently unable to deal with time-varying information” In particular, some consonants are difficult to distinguish.

Group Classification Scheme In addition to having a single network classify all phonemes, a two level hierarchy was developed: Classify to which phonetic group a phoneme belongs (unvoiced plosives, voiced plosives, unvoiced frictaves, voiced frictaves+glides, nasals, vowels). Classify phonemes within a specific phonetic group

Results Overall, recurrent neural networks were superior to feed-forward neural networks (MLNN). Overall, the group classification scheme was more effective than a single RNN. In most cases, the Type 1 RNN outperformed the Type 2 RNN. “[T]raining affects weights of all the connections in the Type 1 RNN, while it affects only part of the connections in the Type 2 RNN”

Detailed Results Accuracies (%) Type 1 RNN Type 2 RNN MLNN Single Network 84.9 75.1 68.5 Group Classification 91.9 88.1 81.3 Intra-group Recognition (average) 95.2 92.2 89.8 Overall Group Classification Scheme --

Application to Korean Classification Problem For unvoiced fricatives, the group to which ㅅ and ㅆ belong, the network performed as follows: Type 1 RNN Type 2 RNN MLNN Accuracy (%) 87.6 84.0 81.1

Questions?