Korean Phoneme Discrimination

Slides:

Advertisements

Similar presentations

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.

Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Lecture 14 – Neural Networks

F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)

Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004.

A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.

Text Independent Speaker Recognition with Added Noise Jason Cardillo & Raihan Ali Bashir April 11, 2005.

Speaker Adaptation for Vowel Classification

Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.

Optimal Adaptation for Statistical Classifiers Xiao Li.

A PRESENTATION BY SHAMALEE DESHPANDE

Speech Recognition Deep Learning and Neural Nets Spring 2015.

West Virginia University

Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.

Eng. Shady Yehia El-Mashad

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Study of Word-Level Accent Classification and Gender Factors

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way

Image recognition using analysis of the frequency domain features 1.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Explorations in Neural Networks Tianhui Cai Period 3.

Chapter 9 Neural Network.

Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.

Implementing a Speech Recognition System on a GPU using CUDA

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.

The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.

Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.

Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.

Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

NUKUS STATE PEDAGOGICAL INSTITUTE named after AJINIYAZ

1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

The Sounds of English: an Introduction to English Phonetics.

CS 445/656 Computer & New Media

Recognition of bumblebee species by their buzzing sound

Deep Learning Amin Sobhani.

Randomness in Neural Networks

an Introduction to English

ARTIFICIAL NEURAL NETWORKS

Spoken Digit Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Customer Satisfaction Based on Voice

2.2.2 Complementary distribution

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Bird-species Recognition Using Convolutional Neural Network

Musical Style Classification

Example: Voice Recognition

Artificial Neural Networks

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Audio and Speech Computers & New Media.

Digital Systems: Hardware Organization and Design

John H.L. Hansen & Taufiq Al Babba Hasan

Advances in Deep Audio and Audio-Visual Processing

Phoneme Recognition Using Neural Networks by Albert VanderMeulen

Presented by Chen-Wei Liu

Presenter: Shih-Hsiang(士翔)

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.

Presentation transcript:

Korean Phoneme Discrimination Ben Lickly

Motivation Certain Korean phonemes are very difficult for English speakers to distinguish: ㅅ(IPA: s) ㅆ(IPA: s͈ )

Cepstral Analysis Need to modify sounds into a format meaningful to the network Mel Frequency Cepstral Coefficients (MFCC) are a popular method of feature extraction. MFCC take a discrete Fourier transform based on a modified scale. The mel scale

Publication of Interest Recurrent Neural Networks for Phoneme Recognition Takuya Koizumi, Mikio Mori, Shuji Taniguchi, and Mitsutoshi Maruya Dept. of Information Science, Fukui University, Japan Applied recurrent neural networks to classify phonemes from a Japanese word database

Overview of recurrent neural networks In contrast with feed-forward networks, recurrent neural networks can have cycles. This means that the input can be split up among multiple time steps. In this publication, two types of recurrent neural networks were studied.

Type 1 RNN

Type 2 RNN

Benefits of recurrent neural networks “[F]eedforward multi-layer neural networks are inherently unable to deal with time-varying information” In particular, some consonants are difficult to distinguish.

Group Classification Scheme In addition to having a single network classify all phonemes, a two level hierarchy was developed: Classify to which phonetic group a phoneme belongs (unvoiced plosives, voiced plosives, unvoiced frictaves, voiced frictaves+glides, nasals, vowels). Classify phonemes within a specific phonetic group

Results Overall, recurrent neural networks were superior to feed-forward neural networks (MLNN). Overall, the group classification scheme was more effective than a single RNN. In most cases, the Type 1 RNN outperformed the Type 2 RNN. “[T]raining affects weights of all the connections in the Type 1 RNN, while it affects only part of the connections in the Type 2 RNN”

Detailed Results Accuracies (%) Type 1 RNN Type 2 RNN MLNN Single Network 84.9 75.1 68.5 Group Classification 91.9 88.1 81.3 Intra-group Recognition (average) 95.2 92.2 89.8 Overall Group Classification Scheme --

Application to Korean Classification Problem For unvoiced fricatives, the group to which ㅅ and ㅆ belong, the network performed as follows: Type 1 RNN Type 2 RNN MLNN Accuracy (%) 87.6 84.0 81.1

Questions?