Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003 Supervisor: Audrey Mbogho
Introduction Variety of applications Word processing In-car voice activation Over-the-phone automated business systems Mobile phone interactions Biometric identification
Introduction AT&T Bell labs Processing power was the initial barrier Speeds of up to 160 wpm are possible With accuracy of 95%
Introduction Why use command based interfaces on cell-phones? Small keypads Hands free No required visual feedback Quick access to common functions
How it works Analogue sound waves are converted to digital format The acoustical model breaks the digitized input into phonemes
How it works Phonemes are analysed in the context of the phonemes around them This is done according to a statistical model to identify the assumed spoken word
Available models Neural Networks Dynamic time warping Knowledge based speech recognition The hidden Markov Model
The Toolkits we will be using The Sphinx Project Hidden Markov Model The NICO Toolkit Artificial neural network
Our Problem Domain Evaluating the two models performance Assessing the applicability of the models in mobile environments
Our Approach We will be implementing and comparing two software packages Scaling the packages for mobile devices Testing them in a simulated mobile environment If feasible we will be implementing the preferred package on a mobile device
The Sphinx Project Carnegie Mellon University funded by DARPA Open source (GPL) Latest version written in Java Based on Hidden Markov Models
The NICO Toolkit Neural Inference COmputation Developed during Open Source (BSD) Written in C Written for UNIX Its focus is for Speech Recognition General Neural Network Software
Division Of Work Both Designing evaluation criteria Neil Research Hidden Markov Model Implement and Scale Sphinx Evaluate Sphinx Steve Research Neural Networks Implement and Scale NICO Evaluate NICO Both Mobile implementation
Timeline
Risks Failure to implement and scale the packages Lack of sufficient documentation for the packages Failure to understand how they work Falling behind schedule
Goals Further the research on speech recognition Determine the effectiveness of these algorithms in mobile environments Produce a working prototype that can be run on mobile devices