Presentation is loading. Please wait.

Presentation is loading. Please wait.

How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.

Similar presentations


Presentation on theme: "How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used."— Presentation transcript:

1 How Spread Works

2 Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used to visually motivate deaf and hearing impaired children to learn to speak

3 CLIENT How does Spread work? Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring result Feedback

4 CLIENT Selection SERVER

5 Selection The user is presented with a screen showing the word to pronounce

6 Selection The user is presented with a screen showing the word to pronounce

7 Selection The user is presented with a screen showing the word to pronounce

8 CLIENT Recording Record Selection SERVER

9 Recording Recording begins once the user clicks the record button.

10 CLIENT Transmission Record Selection SERVER.wav file + current word

11 Transmission Transmission begins once the stop button is pressed. The wav file, the current word and the training phoneme are sent to the server for processing. transmission CLIENT K AA R SERVER Training Phoneme

12 CLIENT Transcribing & Sphinx Record Selection SERVER Sphinx.wav file + current word Transcribe

13 Transcribing Once the wav file arrives at the server, it is inputted into Sphinx in order to recognize what the user said Sphinx

14 Sphinx is a Java-based Hidden Markov Model speech recognition system developed by Carnegie Mellon University Sphinx

15 To decode the wav file, Sphinx needs three data sets – Acoustic Model – Dictionary – Language Model Sphinx Acoustic Model Dictionary Language Model

16 Acoustic Model The Acoustic Model maps sound features to units of speech called phonemes Derived through the sampling of a large data set of spoken words called a speech corpus K AA R

17 Dictionary The dictionary maps words into phonemes... CAN K AE N CAR K AA R CAT K AE T T...

18 Language Model The language model indicates the probability of a particular word appearing given the previous words – Not used since Spread only needs to recognize individual words

19 Decoding Sphinx in Spread is configured to detect what phonemes were pronounced by the user SPHINX K K AA R R

20 Increasing Accuracy To increase accuracy, Sphinx in Spread is only made to recognize a limited number of phonemes per level 7 levels means 7 individually configured Sphinxes Sphinx Level1 CAR, JAR, STAR… Sphinx Level2 BED, NET, TENT… Sphinx Level3 PLAY, PARTY, CIRCLE…

21 CLIENT Scoring Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring

22 The server compares the decoded result against the expected result, taking note of the training phoneme Sphinx You said: K AA R You said: K AA R Expected: Training Phoneme

23 CLIENT Final result Record Selection SERVER Sphinx.wav file + current word Transcribe Scoring result Feedback

24 The result is sent over to the client to give feedback to the user

25 Preliminary results Tested with adult members of the hearing impaired community – Very positive. – "I wish I had this when I was learning speech" Problems: Too enthusiastic – Loud cheering noises reduced recognition rates

26 Preliminary results SPREAD was tested with hearing impaired students of the SPED division of the Batino Elementary School in Proj. 3, Quezon City – Accuracy testing and software evaluation

27 Working with the children Of the 40 students, only 5 volunteered to test the software – The children were generally shy and hesitant to perform the speech

28 Working with the children The children only knew very few words – They knew how to sign some of the words but not to vocalize them General mood was as if they were taking an exam that they were not prepared for

29 Working with the children Surprisingly, children were very good at conversational phrases – “Good morning” – “Good bye and thank you!”

30 Working with the teachers Teachers still need to help the students vocalize some words – System at yet cannot be left unsupervised with the students

31 Working with the teachers Noisy screen distracts students – Need to have a simpler screen to focus on

32 Recognition Rates Sphinx recognition rates were low – Hampered by noisy environment

33 Conclusion Need to work closely with SPED teachers on speech curriculum – Test on just recently learned words Conversational phrases – Hearing impaired children use simple phrases rather than words. – Conversational phrases spoken, other words signed UI improvements, simple is better Accuracy improvements urgently needed

34 The Spread Team

35 Image Sources Microphone - http://mmflc.com/images/microphone-stock-image.jpg Crystal Project - http://www.everaldo.com/crystal/ Wave form - http://bipinb.com/converting-wav-file-to-gsm-file.htm

36 Extra slides follow…

37 Scoring There are three possible outcomes – EXCELLENT – Good – Sorry

38 Scoring Getting the training phoneme correctly as well as the correct length of the phoneme gets an EXCELLENT score K AA R Expected: Sphinx You said: K AA R You said: K AA R 3 Phonemes Long Got the Training Phoneme

39 Scoring Note that Spread is only looking for the correct pronunciation of the training vowel K AA R Expected: Sphinx You said: K AA T You said: K AA T 3 Phonemes Long Got the Training Phoneme

40 Scoring Not getting the correct word length gets a Good score K AA R Expected: Sphinx You said: K AA R T You said: K AA R T 3 Phonemes Long Got the Training Phoneme

41 Scoring Not getting the training vowel means the user will have to try again – Length is no longer checked K AA R Expected: Sphinx You said: K AE R You said: K AE R 3 Phonemes Long Got the Training Phoneme Sorry =(

42 Updates SPREAD has undergone BETA testing with a group of hearing impaired adults – Testing of original (pass/fail) algorithm Results – Low recognition rates even for recognizable speech – Puzzling due to high recognition rates with lab speech

43 Recognition Rate WordRateClose word Apple60%Apple (60%) Art6%Bat (66%) Banana13%Apple(73%) Bat66%Bat (66%) Car0%Hand (46%) Fan0%Hand (53%) Hand20%Bat (33%) Jar0%Hand (60%) Lamb0%Apple (33%) Sofa0%Hand (46%) Star0%Fan (26%) Table0%Apple (46%) Van0%Art (26%) Wallet0%Hand (60%)

44 Analysis Microphone Lab test data Live data

45 Recommendations Better microphone/setup – Sphinx has preprocessing modules for less noise Per word recognition – Use creative word combinations to isolate training phoneme w/o having to go into per phoneme recognition Check out phoneme recognizers

46 Per phoneme recognition Per phoneme recognition is worse – Spread is highly dependent on full words for increased recognition rates Recognizing: Lamb2.wav I heard: ae ah m Recognizing: Lamb3.wav I heard: ae m Recognizing: Sofa1.wav I heard: s ow l ow Recognizing: Sofa2.wav I heard: s ae Recognizing: Sofa3.wav I heard: s ow hh aa Recognizing: Star1.wav I heard: ao t Recognizing: Star2.wav I heard: s d aa r Recognizing: Star3.wav I heard: s d aa r Recognizing: Table1.wav I heard: ah d l Recognizing: Table2.wav I heard: ae ah Recognizing: Table3.wav I heard: ae ah Recognizing: Van1.wav I heard: m ae Recognizing: Van2.wav I heard: m ae


Download ppt "How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used."

Similar presentations


Ads by Google