Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter

Background Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems Simplest: user-dependent limited vocabulary Hard to design any system Variations of speech, i.e. amplitude, duration, and signal to noise Background noise Reverberation noise. Implemented in banking, telephone, etc. IBM ViaVoice

Project Outline Design a user-dependent speech recognition system to control the movement of a small remote control car Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system Linear Predictive Coding Cepstrum Coefficients Mel-frequency Cepstrum Coefficients

System Design Microphone TI 6713 DSP Board Sample word at 8 kHz Segment word into time frames Find Mel-Cepstrum coefficients for each frame Compare input word to a codebook of defined words using dynamic time warping Recognized word

Components List Texas Instruments TMS320C6713 DSP Board Audio Technica Omnidirectional Microphone ATR35S Two step motors

Linear Predictive Coding Provides a good model of the speech signal. Can approximate a speech sample at time n from past samples. where a 1,a 2,…,a p are coefficients that weight each sample.

Mel-frequency Cepstrum Coefficients Research has shown mel- frequency cepstrum coefficients to be better than cepstrum coefficients and LPC Modeled around human auditory system (ear) where c n is the n th order mel-frequency cepstrum, and S k is the power of the k th mel filter. 12 mel-frequency cepstrum coefficients characterize each time frame

Dynamic Time Warping Arranged mel-frequency coefficients into vectors Use dynamic time warping to find best match Compare words that are uttered in a different time frame. You have a referenced word that you are listening for You have a sampled word Want to compared both words, sampled and referenced, and see if they match Compare mel-frequency cepstrum coefficients for each frame of speech

Dynamic Time Warping Example of DTW:

Dynamic Time Warping Solution:

Results WordRecognition Rate Backward50 % Forward70 % Left90 % Right40 % Sources of error: 1. Noise, i.e. computer fan, fluorescent light. 2. Voice changes, i.e. a word spoken on a day might not sound the same on the next day 3. Trained to one word template

Problems Encountered Warping frequency domain into mel-frequency, i.e. Log 10. Translation of MATLAB code into C, i.e. dynamic arrays, debugging process Dynamic time warping, i.e. theory, algorithm

Future Work The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed. The code will be modified to allow the recognition system to operate in real-time. A more comprehensive testing of the system will be performed under a variety of noise conditions.

That is all.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Similar presentations

Presentation on theme: "Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

Similar presentations

Presentation on theme: "Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter."— Presentation transcript:

Similar presentations

About project

Feedback