Speech Recognition Application

Slides:



Advertisements
Similar presentations
Page 1. Page 2 Virtual Speaker: A Virtual Studio The software: Virtual Speaker is a package that automatically creates your voice files, prompts or any.
Advertisements

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Speech Recognition. What makes speech recognition hard?
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Voice Recognition Lawrence Pan Syen Hassan Jamme Tan.
Auditory User Interfaces
Why is ASR Hard? Natural speech is continuous
Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Automatic Continuous Speech Recognition Database speech text Scoring.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
With Jim Mollé Learn iT! Computer Software Training.
VeribisCRM CUSTOMER RELATIONSHIP MANAGEMENT Engin Duran Experience is our know how.
Speech Recognition Application
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
Listener Controlled Navigation of VoiceXML Documents Gopal Gupta N. Annamalai, H. Reddy Dept. of Computer Science UT Dallas.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Mapping the GSM landscape SVG Open Goal Location determination –Best effort –Automatic Take whatever is available: –GPS Accurate, but requires hardware.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Assistive Technology November 14, Screen Reader Who uses screen readers? –People with little to no vision What is it? –A form of “Assistive Technology”
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2013 by Larson Technical Services
Using Voice to Solve Ergonomic Problems Dr. William Lenharth, CHFP UNH – Project54.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Speech Recognition Created By : Kanjariya Hardik G.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.
Presented By Sharmin Sirajudeen S7 CS Reg No :
How can speech technology be used to help people with disabilities?
A NONPARAMETRIC BAYESIAN APPROACH FOR
G. Anushiya Rachel Project Officer
Automatic Speech Recognition
Reza Yazdani Albert Segura José-María Arnau Antonio González
Music Editing Software
Mr. Darko Pekar, Speech Morphing Inc.
Enable Talk Prepared By: Alaa Mayyaleh Shurouq Abu salhiah.
Automatic Speech Recognition
Speech recognition in mobile environment Robust ASR with dual Mic
Artificial Intelligence for Speech Recognition
E-Commerce Lecture 8.
Dr. ElSayed Eissa Hemayed
Statistical Models for Automatic Speech Recognition
3.0 Map of Subject Areas.
The Advantages of Database
VOCANTAS WEBINAR FOR PUBLIC SECTOR
CONNECTIVE APP Connect, Communicate , Encourage, Educate, and aware disable people.
Assistive System Progress Report 1
Statistical Models for Automatic Speech Recognition
Fundamentals of the Computer
VOCANTAS WEBINAR FOR HEALTHCARE
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Enable Talk Prepared By: Alaa Mayyaleh Shurouq Abu salhiah.
Command Me Specification
Voice Activation for Wealth Management
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Human and Computer Interaction (H.C.I.) &Communication Skills
Contents Introduction Motivation Objectives
Presentation transcript:

Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah رباح يوسف -

Why Speech Enabled Phone Directory Growing Technology Easy Access Mainly used for: Educational purposes People with certain Disabilities Mobile use

Problem Automatic speech interacting phone directory assistance

Automatic Speech Recognition - Sphinx Speaker Dependent vs. Independent Acoustic modeling Isolated vs. Continuous HMM – Probabilities, Parameters, Training Language Model Unigrams: <s> & </s> Bigrams: P(word2 | word1) Phonemes Lexicon Structure ZERO Z IH R OW TWO T UW H A HEIGH H

Input / Output FWDVIT: H E L L (null) 24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace (null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null)

Difficulties Hardware issues ASR software issues Letter phonemes Time

Solution 4 Stage Process :

Solution Database (PostgreSQL) Names Phone numbers Fast access

Solution Architecture of application Example: db.pm people.pm people.pl record.pl wav_to_raw.pl get_speech.pl display_speech.pm display_speech.pl VEPD.pm VEPD.pl Example: … PC: press space bar before and after you speak: User: S AH EM PC: Decoded as, SAM ? Results | 1 1. SAM |SMITH | 765-973-2145

Solution

Results A first step towards hands free speech enabled phone directory Speaker Independent Application’s Features: Adding user Retrieving user (via speech) Manual search Viewing current phone directory

Possible Future Enhancement ASR enabled for : Adding users Phone # search Word Recognition (instead of letters) More accurate ASR (as tech. Grows) Graphical outlook (via perl/tk) Communication through VoiceXML

Special Thanks To friends and family Jim Rogers Hassan Halta Skylar Thompson Kushboo Goel Rabah family El-Shabab el-taybeh

Questions/Comments