Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah رباح يوسف -
Why Speech Enabled Phone Directory Growing Technology Easy Access Mainly used for: Educational purposes People with certain Disabilities Mobile use
Problem Automatic speech interacting phone directory assistance
Automatic Speech Recognition - Sphinx Speaker Dependent vs. Independent Acoustic modeling Isolated vs. Continuous HMM – Probabilities, Parameters, Training Language Model Unigrams: <s> & </s> Bigrams: P(word2 | word1) Phonemes Lexicon Structure ZERO Z IH R OW TWO T UW H A HEIGH H
Input / Output FWDVIT: H E L L (null) 24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace (null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null)
Difficulties Hardware issues ASR software issues Letter phonemes Time
Solution 4 Stage Process :
Solution Database (PostgreSQL) Names Phone numbers Fast access
Solution Architecture of application Example: db.pm people.pm people.pl record.pl wav_to_raw.pl get_speech.pl display_speech.pm display_speech.pl VEPD.pm VEPD.pl Example: … PC: press space bar before and after you speak: User: S AH EM PC: Decoded as, SAM ? Results | 1 1. SAM |SMITH | 765-973-2145
Solution
Results A first step towards hands free speech enabled phone directory Speaker Independent Application’s Features: Adding user Retrieving user (via speech) Manual search Viewing current phone directory
Possible Future Enhancement ASR enabled for : Adding users Phone # search Word Recognition (instead of letters) More accurate ASR (as tech. Grows) Graphical outlook (via perl/tk) Communication through VoiceXML
Special Thanks To friends and family Jim Rogers Hassan Halta Skylar Thompson Kushboo Goel Rabah family El-Shabab el-taybeh
Questions/Comments