How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Building an ASR using HTK CS4706
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Assistive Technology Training Online (ATTO) University at Buffalo – The State University of New York USDE# H324M Write:Outloud.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
1 Security problems of your keyboard –Authentication based on key strokes –Compromising emanations consist of electrical, mechanical, or acoustical –Supply.
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
Why is ASR Hard? Natural speech is continuous
Project Proposal o Description: - Ray Chambers – Teacher At Uppingham Community College - UK An early years program which would benefit children and parents.
How to Teach Pronunciation
Improving Spoken English NativeAccent™. What is NativeAccent? New internet-delivered technology that assesses a student’s English pronunciation skills.
Dragon Naturally Speaking Tutorial What is Dragon Naturally Speaking? Dragon is a dictation software, students can dictate a paper rather than type it.
Roles of the Secretary and Chairperson
This is one of a range of games available from Communication 4 All™ to assist with 2D Shape recognition. You, or the pupils, can click on any textbox.
Assistive Technology By: Roxanne Majeski, Oscar Guerin, Tasha Reaves, Elias Luna.
This module provides training on how to give and score the new DIBELS measure called First Sound Fluency. CLICK.
Introduction to Automatic Speech Recognition
Speech Recognition Final Project Resources
Tips & Tricks for Making Your Moodle Video By: Sara Zachary Thompson & Jenny Owens.
WYNN Reader/Wizard Training Module Karie Lawrence Cypress-Fairbanks I.S.D.
Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas P. Way Department of Computing Sciences, Villanova.
By Noriko and Luisa.  Language proficiency level: advanced beginners  Previous Computer Knowledge: basic word- processing skills  Class size: 12 students.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Speech Recognition Application
By Sarita Jondhale1 Pattern Comparison Techniques.
MARCH 11, 2011 The Continuum of ASL The Continuum of ASL.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
Presented By: Whitney Farris. Levels of Writing Competence The Emergent Writer: At this level the greatest challenge occurs with transcribing the message;
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
Getting started with AUDIOMULCH. Background AudioMulch is not really designed for work in primary schools, and so there is a lot that you may wish to.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Microsoft Assistive Technology Products Brought to you by... Jill Hartman.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Big Ideas in Reading: Phonemic Awareness
How to teach Reading ( Phonics )
Supervisor: Dr. Elsayed Eissa Hemayed. o Marwa Ibrahim Lamey. Mayada Ibrahim Aly. o Mona Sherif Ahmed. o Suad Mohamed Barakat. o Marwa Ibrahim Lamey.
CALL (COMPUTER-ASSISTED LANGUAGE LEARNING)
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2013 by Larson Technical Services
A TV program with this sign has a English caption that was made a correction. The guide to captions for Shizuoka University Television Step. 1 Please click.
Basic structure of sphinx 4
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Speech Recognition Created By : Kanjariya Hardik G.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Notes for Speech Recognition. Speech Recognition Continuous Speech Recognition (CSR) is the software that allows users to speak normally and input data.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Harvey Dillon Carolyn Mee Testing hearing is child’s play John SeymourJesus Cuauhtemoc.
 All children at Ashcroft have a daily phonics lesson.  Children are reminded to use their phonics when reading their home reading book, during guided.
#SummitNow Yes, I'm able to index audio files within Alfresco 2013 Fernando González @fegorama.
Teaching Listening Why teach listening?
Speech Recognition
ELPA21 Data Entry Interface (DEI) Overview
Yes, I'm able to index audio files within Alfresco
Dr. ElSayed Eissa Hemayed
LLL Listening and Language Lab May, 2013
Complaint letter Feedback
Data Entry Interface (DEI) Overview
Web and presentation software
AIRWays Benchmark Previewing System
Presentation transcript:

How Spread Works

Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used to visually motivate deaf and hearing impaired children to learn to speak

CLIENT How does Spread work? Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring result Feedback

CLIENT Selection SERVER

Selection The user is presented with a screen showing the word to pronounce

Selection The user is presented with a screen showing the word to pronounce

Selection The user is presented with a screen showing the word to pronounce

CLIENT Recording Record Selection SERVER

Recording Recording begins once the user clicks the record button.

CLIENT Transmission Record Selection SERVER.wav file + current word

Transmission Transmission begins once the stop button is pressed. The wav file, the current word and the training phoneme are sent to the server for processing. transmission CLIENT K AA R SERVER Training Phoneme

CLIENT Transcribing & Sphinx Record Selection SERVER Sphinx.wav file + current word Transcribe

Transcribing Once the wav file arrives at the server, it is inputted into Sphinx in order to recognize what the user said Sphinx

Sphinx is a Java-based Hidden Markov Model speech recognition system developed by Carnegie Mellon University Sphinx

To decode the wav file, Sphinx needs three data sets – Acoustic Model – Dictionary – Language Model Sphinx Acoustic Model Dictionary Language Model

Acoustic Model The Acoustic Model maps sound features to units of speech called phonemes Derived through the sampling of a large data set of spoken words called a speech corpus K AA R

Dictionary The dictionary maps words into phonemes... CAN K AE N CAR K AA R CAT K AE T T...

Language Model The language model indicates the probability of a particular word appearing given the previous words – Not used since Spread only needs to recognize individual words

Decoding Sphinx in Spread is configured to detect what phonemes were pronounced by the user SPHINX K K AA R R

Increasing Accuracy To increase accuracy, Sphinx in Spread is only made to recognize a limited number of phonemes per level 7 levels means 7 individually configured Sphinxes Sphinx Level1 CAR, JAR, STAR… Sphinx Level2 BED, NET, TENT… Sphinx Level3 PLAY, PARTY, CIRCLE…

CLIENT Scoring Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring

The server compares the decoded result against the expected result, taking note of the training phoneme Sphinx You said: K AA R You said: K AA R Expected: Training Phoneme

CLIENT Final result Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring result Feedback

The result is sent over to the client to give feedback to the user

Preliminary results Tested with adult members of the hearing impaired community – Very positive. – "I wish I had this when I was learning speech" Problems: Too enthusiastic – Loud cheering noises reduced recognition rates

Preliminary results SPREAD was tested with hearing impaired students of the SPED division of the Batino Elementary School in Proj. 3, Quezon City – Accuracy testing and software evaluation

Working with the children Of the 40 students, only 5 volunteered to test the software – The children were generally shy and hesitant to perform the speech

Working with the children The children only knew very few words – They knew how to sign some of the words but not to vocalize them General mood was as if they were taking an exam that they were not prepared for

Working with the children Surprisingly, children were very good at conversational phrases – “Good morning” – “Good bye and thank you!”

Working with the teachers Teachers still need to help the students vocalize some words – System at yet cannot be left unsupervised with the students

Working with the teachers Noisy screen distracts students – Need to have a simpler screen to focus on

Recognition Rates Sphinx recognition rates were low – Hampered by noisy environment

Conclusion Need to work closely with SPED teachers on speech curriculum – Test on just recently learned words Conversational phrases – Hearing impaired children use simple phrases rather than words. – Conversational phrases spoken, other words signed UI improvements, simple is better Accuracy improvements urgently needed

The Spread Team

Image Sources Microphone - Crystal Project - Wave form -

Extra slides follow…

Scoring There are three possible outcomes – EXCELLENT – Good – Sorry

Scoring Getting the training phoneme correctly as well as the correct length of the phoneme gets an EXCELLENT score K AA R Expected: Sphinx You said: K AA R You said: K AA R 3 Phonemes Long Got the Training Phoneme

Scoring Note that Spread is only looking for the correct pronunciation of the training vowel K AA R Expected: Sphinx You said: K AA T You said: K AA T 3 Phonemes Long Got the Training Phoneme

Scoring Not getting the correct word length gets a Good score K AA R Expected: Sphinx You said: K AA R T You said: K AA R T 3 Phonemes Long Got the Training Phoneme

Scoring Not getting the training vowel means the user will have to try again – Length is no longer checked K AA R Expected: Sphinx You said: K AE R You said: K AE R 3 Phonemes Long Got the Training Phoneme Sorry =(

Updates SPREAD has undergone BETA testing with a group of hearing impaired adults – Testing of original (pass/fail) algorithm Results – Low recognition rates even for recognizable speech – Puzzling due to high recognition rates with lab speech

Recognition Rate WordRateClose word Apple60%Apple (60%) Art6%Bat (66%) Banana13%Apple(73%) Bat66%Bat (66%) Car0%Hand (46%) Fan0%Hand (53%) Hand20%Bat (33%) Jar0%Hand (60%) Lamb0%Apple (33%) Sofa0%Hand (46%) Star0%Fan (26%) Table0%Apple (46%) Van0%Art (26%) Wallet0%Hand (60%)

Analysis Microphone Lab test data Live data

Recommendations Better microphone/setup – Sphinx has preprocessing modules for less noise Per word recognition – Use creative word combinations to isolate training phoneme w/o having to go into per phoneme recognition Check out phoneme recognizers

Per phoneme recognition Per phoneme recognition is worse – Spread is highly dependent on full words for increased recognition rates Recognizing: Lamb2.wav I heard: ae ah m Recognizing: Lamb3.wav I heard: ae m Recognizing: Sofa1.wav I heard: s ow l ow Recognizing: Sofa2.wav I heard: s ae Recognizing: Sofa3.wav I heard: s ow hh aa Recognizing: Star1.wav I heard: ao t Recognizing: Star2.wav I heard: s d aa r Recognizing: Star3.wav I heard: s d aa r Recognizing: Table1.wav I heard: ah d l Recognizing: Table2.wav I heard: ae ah Recognizing: Table3.wav I heard: ae ah Recognizing: Van1.wav I heard: m ae Recognizing: Van2.wav I heard: m ae