USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
Connecting with Computer Science, 2e
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Why is ASR Hard? Natural speech is continuous
FLANN Fast Library for Approximate Nearest Neighbors
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Natural Language Understanding
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
Introduction to Automatic Speech Recognition
West Virginia University
Introduction 01_intro.ppt
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
How FACILITY CMIS and E-Portal are used within the organisation
Speech Recognition Application
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
IT253: Computer Organization
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Jacob Zurasky ECE5526 – Spring 2011
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Building Marketing Databases. In-House or Outside Bureau? Outside Bureau: Outside agency that specializes in designing and developing customized databases.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Basic structure of sphinx 4
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
#SummitNow Yes, I'm able to index audio files within Alfresco 2013 Fernando González @fegorama.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Yes, I'm able to index audio files within Alfresco
S.Rajeswari Head , Scientific Information Resource Division
Speech recognition in mobile environment Robust ASR with dual Mic
Speech Processing AEGIS RET All-Hands Meeting
Artificial Intelligence for Speech Recognition
Automatic Fluency Assessment
Speech Processing Speech Recognition
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Automatic Speech Recognition: Conditional Random Fields for ASR
Sphinx Recognizer Progress Q2 2004
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Keyword Spotting Dynamic Time Warping
Auditory Morphing Weyni Clacken
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1

Problem Statement Florida Institute of Technology 2  The elaboration of the zip code algorithm is a bit straight forward since the program actually maps each city and it corresponding zip code [13].  And this is done with the java harsh map class which associates keys and values. Unfortunately, each key (zip city) can only access or refer one value. Therefore one city corresponds to one zip code and one zip code only.

 This situation is reversible with this new application; the area code maps a single area code to all the corresponding cites Florida Institute of Technology Why is the New Approach Better ?  In this new algorithm, each key can be pointed to many cities (values). In other words, the values can be more than one  To fix this problem in the zip code application, the single values data are replaced by arrays that contain the corresponding cities  Area code application as extension to the zip code application 3

II-The Area Codes application II-1-The Database Florida Institute of Technology 4  The database contains in text file the list of all cities in USA with their associated area codes and the GPS coordinates (longitudes and latitudes). A conversion is internally processed to obtain those GPS coordinates from DEGREE/MINUTES/SECONDS format into a decimal degree format.  A grammar file contains the representation of the number of digits used. The area code application a 3 digit recognizer, therefore this grammar file provide a representation of these digits associated with the corresponding word. The words range from “zero” to “nine”. That means all the possible 3 digit number will be formed from these words. As a record no area or zip code in USA starts with zero.

Florida Institute of Technology 5 II-2-The Recognizer II-2-1-General Information  The recognizer system used is the sphinx-4. Sphinx4 is in fact a powerful speech recognition system entirely written in Java. The recognizer itself has been designed in collaboration between the sphinx group at Carnegie Mellon University, Suns Microsystems Laboratories, Mitsubishi Electric Research Labs, Hewlett Packard and a contribution from MIT and University of California at Santa Cruz  Concerning its performance sphinx- 4 is capable of performing many different types of recognitions tasks

Florida Institute of Technology 6 II-2-2-Performance  The sphinx-4 recognizer can ultimately perform many different tasks of recognitions. Its flexibility and capabalities provide recognition of discrete and continous speech. Sphynx-4 includes pluggable implementations of preeamphasis, Hamming window, FFT, Mel frequency filter bank, discrete cosine, cepstral mean normalization, feature extraction of cepstral, delta cepstra, double delta cepstra features. It also includes an acoustic model architecture, language model for ASII and binary versions of unigram, bigram, trigram, and a generalized pluggable front end application [1]  Finally sphinx-4 provides pluggable support for word pruning searches, search management, and breadth first.

Florida Institute of Technology 7 Test S3.3 WER S4 WERS3.3RTS4 RT(1) S4 RT(2) Vocabula ry size Languag e model TI Isolated digits recognitio n TIDIGITS Continuo us digits AN trigram RMI ,000trigram WSJ5K ,000trigram HUB ~ ,000trigram II-2-2-Performance

Florida Institute of Technology 8 II-2-2-Performance -FWER~Word error rate(%) lower is better -RT- Real time-ratio of processing time to audio time (lower is desirable) -S3.3RT-result for a single or dual CPU configuration -S4 RT(1)-results on a single-CPU configuration -S4 RT(2)- results for a dual-CPU configuration -Small Vocabulary (AN4) which extends the vocabulary to around 100 words, with data input such as speaking words, spelling words out letter by letter -Medium vocabulary(RM1) which extends the vocabulary to approximately 1,000 words -Medium vocabulary(WSJ5K) extends the vocabulary to approximately 5,000 words -Large Vocabulary(HUB4): extends the vocabulary to aproximately 64,000 words

Florida Institute of Technology 9 II-2-3-HMM-Based speech recognition system  figure1 [1] show how the words are processed from the entry node to the exit node

 figure2 [1] shows all the process leading to obtain the highest score Florida Institute of Technology 10 II-2-3-HMM-Based speech recognition system

Florida Institute of Technology 11 II-3-The Mapping system  The mapping function just takes the 3 digit number obtained from the recognizer (which corresponds to best high score) and then search through the area code database all the cities and areas that share that 3 digit area code number  As example, if the word “three two one” where provided through the microphone, the recognizer if the scoring process was successful will return the digit 321. This digit will now load all the cities that have 321 as area code from the database. The global position coordinates (longitude and latitude provided) will allow the plots of those cities on USA map.

Florida Institute of Technology 12 III-Results  The area code application simulation is a success. It was able to provide all the cities for a given area code.  As an example 321 matches with Melbourne, Altamonte Springs, Apopka, and, Casselberry which all are located in the state of Florida.  However, we observed sometimes the area code provided by the recognizer is different from the desired area code of the speaker would have expected. Therefore it is important to recognize the difficulty and the complexity of the search [2]. Often, for small vocabularies, it is possible to perform optimal search; however for a large vocabularies, pruning will be necessary. Pruning may introduce search errors that can affect the recognition accuracy [2].

Florida Institute of Technology 13 IV-Conclusion and Future Work  The area code application is more flexible than the zip code algorithm as we were successfully able to visualize all the associated cities on the map from a given USA area code. The zip code algorithm should not been as bad but it could be considered an extension of the old one.  A future work could combine the zip city and area codes applications to be implemented into a software that could be used by tourists visiting USA.  Overall, nowadays the performance of speech recognition components has significantly improved: only within ten years we have passed from systems able to recognize isolated two words uttered by a single speaker using a limited lexicon of around 50 words to systems able to recognize continuous speech with an unlimited vocabulary uttered by any speaker; or to systems able to carry a spontaneous dialog with a vocabulary of a few thousands of words over the telephone (e.g information on train or airplane schedules)[3]

[1] [2] Kai-Fu Lee,’Automatic Speech Recognition, The development of the SPHINX system, forword by Raj Reddy, Kluwer Academic Publishers [3] WolFgang Minker, Samir Bennacef, Speech and Human-Machine Dialog, Kluwer Academic Publishers 14 Florida Institute of Technology References

Thank You Questions … Florida Institute of Technology 15