SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN.

Slides:



Advertisements
Similar presentations
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Hidden Markov Models Theory By Johan Walters (SR 2003)
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Auditory User Interfaces
New Technologies Are Surfacing Everyday. l Some will have a dramatic affect on the business environment. l Others will totally change the way you live.
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Introduction to Automatic Speech Recognition
Computer Organization
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Isolated-Word Speech Recognition Using Hidden Markov Models
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
7-Speech Recognition Speech Recognition Concepts
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Developing an Effective Wireless Middleware Strategy.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Collaborator Revolutionizing the way you communicate and understand
Performance Comparison of Speaker and Emotion Recognition
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
© 2013 by Larson Technical Services
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Speech Recognition Created By : Kanjariya Hardik G.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Mobile Speech Translation Systems Design for /19/2013 INST603 Term Project MIM, UMD Makoto Asami.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
How can speech technology be used to help people with disabilities?
Automatic Speech Recognition
Reza Yazdani Albert Segura José-María Arnau Antonio González
Natural Language Processing and Speech Enabled Applications
Speech Recognition UNIT -5.
Artificial Intelligence for Speech Recognition
Statistical Models for Automatic Speech Recognition
3.0 Map of Subject Areas.
Overview of Computer Architecture and Organization
Command Me Specification
LECTURE 15: REESTIMATION, EM AND MIXTURES
Presentation transcript:

SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN

Introduction What is speech recognition? Automatic speech recognition(ASR) is the process by which a computer maps an acoustic speech signal to text.

CLASSIFICATION OF SPEECH RECOGNITION SYSTEM Users - Speaker dependent system - Speaker independent system -Speaker adaptive system Vocabulary -small vocabulary : tens of word -medium vocabulary : hundreds of words -large vocabulary : thousands of words -very-large vocabulary : tens of thousands of words.

CLASSIFICATION OF SPEECH RECOGNITION SYSTEM Word Pattern - isolated-word system : single words at a time - continuous speech system : words are connected together

HOW SPEECH RECOGNITION WORKS

APPLICATIONS Healthcare Military Helicopters Training air traffic controllers Telephony and other domains

WHY SPEECH RECOGNITION? Speech is the easiest and most common way for people to communicate. Speech is also faster than typing on a keypad and more expressive than clicking on a menu item. Users with low literacy. Cellphones have widely proliferated the market.

CHALLENGES ON MOBILE DEVICES Limited available storage space Cheap and variable microphones No hardware support for floating point arithmetic Low processor clock-frequency Small cache of 8-32 KB Highly variable and challenging acoustic environments ranging from heavy background traffic noises to a small room with reverberation of multiple speakers speaking simultaneously Consume a lot of energy during algorithm execution

ASR MODELS Embedded speech recognition Speech recognition in the cloud Distributed speech recognition Shared speech recognition with user based adaptation(proposed model of use)

EMBEDDED MOBILE SPEECH RECOGNITION

Advantages Not rely on any communication with a central server Cost effective Not affected by the latency

EMBEDDED MOBILE SPEECH RECOGNITION Disadvantages Cannot perform complex computations Lack in terms of speed and memory To achieve reliable performance, modifications need to be made to every sub-system of the ASR to take both factors into account.

SPEECH RECOGNITION IN THE CLOUD

Advantages Improves speed and accuracy It provides an easy way to upgrade or modify the central speech recognition system. It can be used for speech recognition with low-end mobile devices such as cheap cellphones.

SPEECH RECOGNITION IN THE CLOUD Disadvantages Performance degradation Acoustic models on the central server need to account for large variations in the different channels. Each data transfer over the telephone network can cost money for the end user.

DISTRIBUTED SPEECH RECOGNITION

Advantages Does not really need high quality speech Improve word error rates

DISTRIBUTED SPEECH RECOGNITION Disadvantages The major disadvantage of this mode still remains cost and the need of continuous and reliable cellular connection,. There’s a need for standardized feature extraction processes that account for variability's arising due to differences in channel, multi-linguality, variable accents, and gender differences, etc.

SHARED SPEECH RECOGNITION WITH USER BASED ADAPTATION

Advantages The ability to function even without network connectivity. Works well for the limited set of conditions it encounters. It can be covered successfully by existing mobile devices, if trained or adapted accordingly. Server capacity has to be provided only for average, not peak use.

Speech recognition Process in detail

Front-end Process Involves spectral analysis that derives feature vectors to capture salient spectral characteristics of speech input. Backend Process Combines word-level matching and sentence-level search to perform an inverse operation to decode the message from the speech waveform.

Acoustic model Provides a method of calculating the likelihood of any feature vector sequence Y given a word W. Each phone is represented by a HMM.

Language Model The purpose of the language model is to take advantage of linguistic constraints to compute the probability of different word sequences Assuming a sequence of words, ={1,2,…,k}, the probability () can be expanded as ()=(1,2,…,k) We generally make the simplifying assumption that any word depends only on the previous −1 words in the sequence This is known as an N-gram model Grammars – Use context free grammars represented by Finite State Automata (FSA)

Overview of Statistical Speech recognition Statistical Speech recognition model

Word sequence is postulated and the language model computes its probability. Each word is converted into sounds or phones using pronunciation dictionary. Each phoneme has a corresponding statistical Hidden Markov Model (HMM). HMM of each phoneme is concatenated to form word model and the likelihood of the data given the word sequence is computed. This process is repeated for many word sequences and the best is chosen as the output. Statistical Speech recognition model

Speech recognition on embedded platforms Embedded ASR can be deployed either locally or in a distributed environment with both advantages and disadvantages. For LVCSR, embedded devices are limited in terms of CPU power and amount of memory. Most importantly, speed is a limiting factor.

Decoding algorithm Asynchronous stack based decoder – memory efficient but complex. Viterbi based decoder – most efficient. 3 types of search implementation Combination of static graph and static search space Static graph space with dynamic search space Dynamic graph

Mobile speech frameworks Nuance - Dragon mobile SDK Openears Sphinx CeedVocal SDK Vlingo

Dragon Mobile SDK The Dragon Mobile SDK provides speech recognition and text-to- speech functionality. The Speech Kit framework provides the classes necessary to perform network-based speech recognition and text-to-speech synthesis. It uses SystemConfiguration and AudioToolbox frameworks.

Speech kit architecture

OpenEars OpenEars is an iOS framework for iPhone voice recognition and speech synthesis (TTS). It uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries. OpenEars works by doing the recognition inside the iPhone without using the network.

Sphinx CMU Sphinx is a open source toolkit for speech recognition developed by Carnegie Melon University. CMU Sphinx is a speaker-independent large vocabulary continuous speech recognizer. Pocketsphinx — lightweight recognizer library written in C. Sphinx4 — adjustable, modifiable recognizer written in Java.

CeedVocal SDK CeedVocal SDK is a isolated word speech recognition SDK for iOS. It operates locally on the device and supports 6 languages : English, French, German, Dutch, Spanish and Italian.

Mobile applications using speech recognition Google now Siri S-Voice Dragon Search Dragon Dictation Trippo-Mondo Verbally

References 1. Rethinking Speech Recognition on Mobile Devices, Anuj Kumar, Anuj Tewari, Seth Horrigan, Matthew Kam, Florian Metze and John Canny. 2. Towards large vocabulary ASR on embedded platforms, Miroslav Novak. 3. Speech Recognition: Statistical Methods, L R Rabiner, B-H Juang th April th April