Download presentation
Presentation is loading. Please wait.
1
Korean Phoneme Discrimination
Ben Lickly
2
Motivation Certain Korean phonemes are very difficult for English speakers to distinguish: ㅅ(IPA: s) ㅆ(IPA: s͈ )
3
Cepstral Analysis Need to modify sounds into a format meaningful to the network Mel Frequency Cepstral Coefficients (MFCC) are a popular method of feature extraction. MFCC take a discrete Fourier transform based on a modified scale. The mel scale
4
Publication of Interest
Recurrent Neural Networks for Phoneme Recognition Takuya Koizumi, Mikio Mori, Shuji Taniguchi, and Mitsutoshi Maruya Dept. of Information Science, Fukui University, Japan Applied recurrent neural networks to classify phonemes from a Japanese word database
5
Overview of recurrent neural networks
In contrast with feed-forward networks, recurrent neural networks can have cycles. This means that the input can be split up among multiple time steps. In this publication, two types of recurrent neural networks were studied.
6
Type 1 RNN
7
Type 2 RNN
8
Benefits of recurrent neural networks
“[F]eedforward multi-layer neural networks are inherently unable to deal with time-varying information” In particular, some consonants are difficult to distinguish.
9
Group Classification Scheme
In addition to having a single network classify all phonemes, a two level hierarchy was developed: Classify to which phonetic group a phoneme belongs (unvoiced plosives, voiced plosives, unvoiced frictaves, voiced frictaves+glides, nasals, vowels). Classify phonemes within a specific phonetic group
10
Results Overall, recurrent neural networks were superior to feed-forward neural networks (MLNN). Overall, the group classification scheme was more effective than a single RNN. In most cases, the Type 1 RNN outperformed the Type 2 RNN. “[T]raining affects weights of all the connections in the Type 1 RNN, while it affects only part of the connections in the Type 2 RNN”
11
Detailed Results Accuracies (%) Type 1 RNN Type 2 RNN MLNN
Single Network 84.9 75.1 68.5 Group Classification 91.9 88.1 81.3 Intra-group Recognition (average) 95.2 92.2 89.8 Overall Group Classification Scheme --
12
Application to Korean Classification Problem
For unvoiced fricatives, the group to which ㅅ and ㅆ belong, the network performed as follows: Type 1 RNN Type 2 RNN MLNN Accuracy (%) 87.6 84.0 81.1
13
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.