Phoneme Recognition Using Neural Networks by Albert VanderMeulen
The Problem The ultimate goal of phoneme recognition is to list all the phonemes in a continuous waveform of speech. This problem is broken down by sliding a window along the waveform and analyzing each segment. In each window we try to determine which, if any, phoneme is present.
What is a Phoneme? Continuous speech waveforms cannot be broken down into words, since when we speak we don't pause between words even though, when we write, we separate them with spaces. In speech recognition we decompose the waveform instead into phonemes, or speech sounds. A phoneme is the smallest unit of language. Ex /p/, /f/, /ʒ/
The Method The two methods most commonly used for phoneme recognition are: Hidden Markov Models Artificial Neural Networks For my project I will be implementing and comparing the results of different types of neural networks.
Types of Neural Nets Single-Layer Perceptron Recurrent Neural Network (Feed-back network) Multi-Layer Perceptron (Feed-forward network) Time-Delay Neural Network
The Data A continuous recording of 50 male English speakers was mined for 4000+ samples of five different phonemes. Each sample is one of the five following phonemes: /ʃ/, /ɑː/, /d/, /iː/, /ɔː/ Each of the samples was converted into a 256 value spectrogram using FFT.