Speech Recognition Application Voice Enabled Phone Directory - Yousef Rabah
Process of Speech Recognition Speaker dependent vs. Speaker Independent Vocabulary Isolated vs. Continuous Frequency changes Pronunciation Speech Processing HMM – Probabilities, Parameters, Training Phonemes to words
Problem Automatic speech interacting phone directory assistance without human interaction.
Automatic Speech Recognition - Sphinx Acoustic modeling Language Model Unigrams: <s> & </s> Bigrams: P(word2 | word1) Trigrams: P(word3| word2 | word1) Lexicon Structure ZERO Z IH R OW ONE W AH N TWO T UW <sil>
Input / Output FWDVIT: H E L L (null) 24003 samples in file /usr/local/share/sphinx3/model/lm/an4/hell.raw INFO: live.c(239): live_nfeatvec: 13 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> INFO: live.c(239): live_nfeatvec: 12 INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> A(2) INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> EIGHTH INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L INFO: main_live_pretend.c(92): PARTIAL HYP: <sil> H E L OH Backtrace(null) LatID SFrm EFrm AScr LScr Type 254 0 45 -391470 -74100 -1<sil> 594 46 81 -472155 -148846 0 H 1291 82 102 -288621 -148846 0 E 1850 103 126 -235274 -148846 0 L 2599 127 147 -430694 -148846 0 L 2650 148 148 0 -148846 0 </s> 0 148 -1818214 -818330 (Total) FWDVIT: H E L L (null)
Difficulties Hardware issues ASR software issues Letter phonemes - “e-set” Time
Solution Database (PostgreSQL) Names Numbers Phone number Fast access
Solution Architecture of application Example (general idea): … PC: Say the letters of first name, press space bar before and after you speak: User: S AA EM PC: Did you say, SAM ? Architecture of application User Interaction Connects to Database Communicates with Sphinx Uses of C, Perl, shell scripts
Solution
Check List Reading ASR system Database - PSQL Applications in C, Perl, PHP, vxml, shell
Timeline