Speech Recognition and Assessment Tomer Meshorer.

Slides:



Advertisements
Similar presentations
Rigorous Vocabulary- Building Strategies. Know content know and apply complex content know and apply content Know complex content.
Advertisements

Speech & language therapy software.
Articulation Treatment
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Project MORE Mentoring in Ohio for Reading Excellence Images were found using Google image search Mentor Training.
Cox data: Average ratings for both sets of instruments for each category (percent preference for each condition)
Why an objective intelligibility assessment ? Catherine Middag Jean-Pierre Martens Gwen Van Nuffelen Marc De Bodt.
Phonological Awareness Intervention with Preschool Children: Changes in Receptive Language Abilities Jodi Dyke, B.S. Tina K. Veale, Ph.D., CCC-SLP Eastern.
Advisor: Prof. Tony Jebara
Chapter 10.  No single definition covers all conditions  IDEA defines multiple disabilities and severe disabilities in two definitions  Two characteristics.
Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.
© 2009 The McGraw-Hill Companies, Inc. Students with Communication Disorders Chapter 7.
CSD 2230 HUMAN COMMUNICATION DISORDERS
National Curriculum Key Stage 2
Early Years Curriculum at Tiverton
English Language Learners and Discovery Education streaming.
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
Game.reha.lviv.ua International Clinic of Rehabilitation WEB-BASED HOME REHABILITATION GAMING SYSTEM.
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
Language Assessment 4 Listening Comprehension Testing Language Assessment Lecture 4 Listening Comprehension Testing Instructor Tung-hsien He, Ph.D. 何東憲老師.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Katherine S. Holmes READ 7140 May 28, Georgia Writing Test – 5 th Grade GOAL: To assess the procedures to enhance statewide instruction in language.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Tablet PCs In Socially Relevant Projects Michael Buckley University of Buffalo.
Shane Reid ED 505. Assistive Technology Any device or service that helps a student with a disability to meet his or her individualized education program.
Increasing Reading Vocabulary
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
Formative Assessment in Flanders Second Chance Learning in Hoboken.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Case Study Presentation
Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
Language and Communication Definitions Developmental scales Communication disorders Speech Disorders Language Disorders Interventions.
Frank E. Musiek, Ph.D., Jennifer Shinn, M.S., and Christine Hare, M. A.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Assessment of Phonology
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Record Keeping and Using Data to Determine Report Card Markings.
Learning disorders – a quick overview
First Grade Reading Workshop
Language and Communication Definitions Developmental scales Communication disorders Speech Disorders Language Disorders Interventions.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,
Language Assessment. Purposes of Assessment – Identifying children with language disorders – Identifying areas of deficit in a child’s language – Designing.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Designing a Voice Activated Compartmentalized Safe with Speech Processing using Matlab Final Presentation Amy Anderson Ernest Bryant Mike Joyner Collins.
Research Paper: Utilizing Technology for Students with Learning Disabilities Alissa Swartz EDUC 504, Computers and Technology in Education June 19, 2006.
1 One Way To Write Session Notes LCD 323 – Spring 2013.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Chapter 8 Children with Communication, Language, and Speech Disorders © Cengage Learning. All rights reserved.
Trial Teaching Strategies: Linking Testing to Teaching Mary Beth Curtis Center for Special Education March 31, 2009.
DYSLEXIA NURUL FAHARIN BT CHE RUSLAN NUTRITION 3.
Year 2 SATs Workshop for Parents Year 2 SATs Introduction: what are the SATs?  Statutory standardised assessment tests.  Statutory for Year 2.
Chapter 15 Recording and Editing Sound
Understanding Standards: Nominee Training Event
Trial Teaching Strategies: Linking Testing to Teaching
Kindergarten Scope & Sequence Unit 7: On the Move
Automatic Speech Recognition
Artificial Intelligence for Speech Recognition
CRF &SVM in Medication Extraction
Conditional Random Fields for ASR
Kindergarten Scope & Sequence Unit 10: School’s Out!
Tactile Auditory Sensory Substitution
Speaker Identification:
ACTIVE LEARNING & TEACHING
Presenter: Shih-Hsiang(士翔)
Keyword Spotting Dynamic Time Warping
Presenter: Donovan Orn
Presentation transcript:

Speech Recognition and Assessment Tomer Meshorer

Agenda This presentation describes the use of speech recognition for: HCI for spastic dysarthria patients [M. Hasegawa-Johnson] Identify progression of Parkinson disease using speech signal[A. Tsanas] Auditory micro switch [G. E. Lancioni] HMM-Based and SVM-Based Recognition of the Speech of Talkers with Spastic Dysarthria Enhanced Classical Dysphonia Measures and Sparse Regression for Telemonitoring of Parkinson's Disease Progression Extending the evaluation of a computer system used as a microswitch for word utterances of persons with multiple disabilities

MOTIVATION  Dysarthria.  Most common spastic dysarthria. Adults with cerebral palsy which find it hard to type.  Idea : Replace keyboard with ASR.  The paper study three talkers and one control subject. All three have spastic dysarthia due to cereblal palacy.  The subject tends to delete word initial consonants. One subject exhibit slow stutter.  Two algorithms:  Digit recognition using HMM  Digit recognition using SVM.

Experiment  Array of 8 mics, 7 were used.  Four types of speech data:  Isolated digits  The letters in the internationl radio alphabet  Nineteen computer command  Read balanced text message (129 words) and 56 sentences(TIMIT)  Total train data: 541 words. 395 distinct words  Performed Intelligibility tests using 40 different words selected from TIMIT sentences.  Listeners are the author and two students

Results ListenerF01M01M02M03 L122.5% 90%30% L217.5%20%90%27.5% L317.5%15%97.5%30% Avg19.2% 92.5%29.2%

Listener errors  Look at consonant position  Three consonant positions: word-initial, word medial and word final  Three types of consonant errors:  deletion (“sport” heard as “port”)  Insertion(“on” heard as “coin”)  Substitution(“for” heard as “bore”)  Other errors:  Vowel Substitution (“and” heard as “end”)  Number of syllable could change  The entire word can be deleted

Listener errors analysis

ASR  Four experiments : two speaker depended HMM and two speaker depended SVM  HMM:  First test:  Test data : 19 command words + 26 letters + 10 digits.  Train data : TIMIT sentences + grandfather passage + utterance for each digit  Second Test:  Test data: only digit  Train data: like test 1.

HMM ASR Results  H – WRA if all micro-phone are independently recognized  HV- WRA if micro-phone vote to determine the final system output  Word - reports accuracy of one SVM trained to distinguish isolated digits  WF - adds outputs of 170 binary word-feature SVM  WFV - LikeWF, but single-microphone recognizers  vote to determine system output

SVM based ASR  Fixed length isolated word recognitions  Tested only digits  Two SVM were used: 10-ary SVM and binary feature SVM.

Conclusion  ASR can be used to recognize digits for talker with very low intelligibility.  HMM was successful for two subjects but failed for the subject that delete consant.  SVM was successful for two subject, but fail for the subject with stutter.  Hence, HMM should be used when word length flucte. SVM should be used against deletion of consonants.  But : 10 word vocabulary is two small for HCI.

MOTIVATION  Parkinson’s Disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s  Strong evidence has emerged linking speech degradation with PD progression.  Current PD progress monitoring is achieved using empirical tests and physical exam which is time consuming and costly.  Results are mapped to Unified Parkinson’s Disease Rating Scale.  Motor-UPDRS  Total -UPDRS – 176 denoting total disability  Goal: Use speech signal processing to map voice disorder to UPDRS scores

Data  sustained vowel speech recordings from 52 subjects with idiopathic PD diagnosis  Subject were physically assessed and given UPDRS scores at baseline, three months and six-months into the trial  Subjects took tests at home weekly Intel At Home Testing Device (AHTD).  The subject were required to sustain “ahh” for as long and steady as possible.  Total of 5875 signals. Signal were procecced in matlab.  42 subject. Mean age:64. motor UPDRS: 20.84, Total UPDRS: 11.52

home

Features  Total of 5875 signals. Signal were processed in matlab.  42 subject. Mean age:64. motor UPDRS: 20.84, Total UPDRS:  Dysphonia measures were calculated using praat  frequency perturbations  Amplitude Perturbations  Also added log of each measure

Linear regression  UPDRS values obtained at 0,3 and 6 months but recording were weekly. Hence used linear interpolation to get weekly UPDRS  Map the feature vector x to UPDRS output y  But ended up using Lasso regression.

Results  Mapping performance was analyzed by training on 5287 phonations, and testing 588.  Used MAE – mean absolute error.  Ui is the real value of UPDRS  U hat, is the predicted value UPDRS

Conclusion  Overall success in prediction (6.6 error for motor UPDRS and (8.4 error for total UPDRS)  Discovered during the paper that a better method exist to measure dysphonia. And hence no need for log transformation.  LASSO regression here clearly shows that log transformed classical dysphonia measures convey superior clinical information compared to the raw measures

G. E. Lancioni,at el

Motivation  Students with multiple disabilities unable to engage in constructive activity or play a positive role in their daily context  Want to explore the usage of verbal utterances to exert control over environmental events  Microswitches are technical tools that may help them improve their status  Main idea: Build an utterance based Microswitch and test it with students.

Participants  Tania, Alex, and Dennis.  18,27,26 years old  Severe intellectual ability  Alex and Dennis totally blind, while Tania can discriminate light.  All of them have normal hearing and can produce number of words / short sentences

Device : Auditory Micro-switch  Regular PC with audio output device  Commercial available ASR (Dragon natural speaking).  Proprietary control program that allowed the linking of each target utterance emitted by a participant with the words and phrases that the commercial software matched to it over different occurrences  Categorize the word and phrases emitted to specific categories based on phonetic, stracture and rule length.  The categories served as recognition target and trigger for activation of stimuli.

Selection of stimulus  Stimulus is connected to participant target utterance.  Tania : funny story, special song  Alex: singer hit song, person whistling  Dennis: Pet voice, local music  The recognition of an utterance by the computer system produced the stimuli matching that utterance for secs

Experiment  Baseline  Participant speak the sample of their target utterances. No stimuli sound.  70 times over several days  Recording were made of the word / phrases and reference categories were build  Intervention  Three groups of utterances : Tania(3,2,2 words) Alex and Dennis (4,4,4).  First group, Base line. Second group, base line, Third group base line.  min sessions. Record recognition.  Post intervention  2 months after intervention.  sessions such as those occurring during intervention

Result

Summary  About 80% of the utterances were correctly recognized by the computer system  Some of the utterances had a level of occurrence significantly higher (P < 0.01) than that expected by chance.  Computer system was an adequate microswitch for the participants’ word utterances  the use of the system can be considered a valuable strategy to increase the participants’ constructive verbal engagement and to allow their self- determination in seeking positive environmental stimulation