New technologies supporting people with severe speech disorders Mark Hawley Barnsley District General Hospital and University of Sheffield
Research Projects STARDUST (Speech Training and Recognition for Disabled Users of Assistive Technology) –DoH NEAT programme OLP (Ortho Logo Paedia) –EU Vth Framework (Quality of life and management of living resources)
Research Team Department of Medical Physics and Clinical Engineering, Barnsley District General Hospital (Mark Hawley, Simon Brownsell) Institute of General Practice and Primary Care, University of Sheffield (Pam Enderby, Mark Parker, Rebecca Palmer) Department of Computer Science, University of Sheffield (Phil Green, Nassos Hatzis, James Carmichael)
Dysarthria A neurological motor speech impairment characterised by slow, weak, imprecise and/or uncoordinated movements of the speech musculature. Frequently associated with other physical disabilities 170/ (Emerson & Enderby 1995) Speech is often difficult to understand (unintelligible) and variable (inconsistent) Severe = <40% intelligible
Intelligibility and Consistency ‘ Normal’ speech will be almost 100% intelligible and with few articulatory differences over time (consistent). ‘Severe’ dysarthria may be completely unintelligible to the naïve listener and will show high variability (inconsistent) –but may show consistency of key elements which will make it more intelligible to the familiar listener. STARDUST is concerned with consistency OLP with intelligibility
STARDUST To develop demonstrators of speech-driven environmental control and voice output communication devices for people with dysarthria To develop a reliable small vocabulary speech recogniser for dysarthric speakers To develop a computer training program to help to stabilise the speech of dysarthric speakers
Speech recognition systems Speech-input writing programmes (eg Dragon) –large vocabulary, speaker adaptive Speech-input control systems –small vocabulary, speaker dependent –Environmental control systems –speech-in - speech-out –Mobile phones –Smart homes
Speech-input writing programmes Normal speech - with recognition training can get >90% recognition rates (Rose and Galdo, 1999) Dysarthric speech - mild 10-15% lower recognition rates (Ferrier, 1992 ) Declining as speech deteriorates - by 30-50% for single words (Thomas-Stonell, 1998, Hawley 2002)
Performance of a commercial speech- recognition environmental controller (in ‘ideal’ conditions)
STARDUST To develop demonstrators of speech-driven environmental control and voice output communication devices for people with dysarthria To develop a reliable small vocabulary speech recogniser for dysarthric speakers To develop a computer training program to help to stabilise the speech of dysarthric speakers
Recognition technology Small vocabulary Speaker dependent uses hidden Markov models based on HTK (University of Cambridge)
Recognition increases with amount of training data
STARDUST recogniser performance (N=number of words used for training)
STARDUST To develop demonstrators of speech-driven environmental control and voice output communication devices for people with dysarthria To develop a reliable small vocabulary speech recogniser for dysarthric speakers To develop a computer training program to help to stabilise the speech of dysarthric speakers (ie improve consistency)
Training tool Quantitative and qualitative real-time visual feedback to improve consistency –at word level –at sub-word level Can be used by the client alone or with carer or therapist Training tool records examples of words - used to build recogniser
Consistency measure word level
OLT visual feedback (sub-word) (Hatzis and Green 1999)
STARDUST - conclusions Recogniser that recognises severely dysarthric speech - well but not perfectly –next step to test in real usage Computer-based training program to improve consistency –word level and sub-word level in future –collects lots of speech data for recogniser Develop demonstrators of environmental controller and speech-output device
OLP To supplement speech therapy by providing computer-based tools for audio-visual feedback to improve clients’ speech production –intelligibility as well as consistency To make this available remotely using distance learning techniques
Partners Institute for Language and Speech Processing, Greece University of Sheffield, UK Royal Institute of Technology, Sweden Polytechnic University of Madrid, Spain ARCHES, France Unisoft Software Applications SA, Greece Logos Centre fo Speech-Voice Pathology, Greece Barnsley District General Hospital, UK
Client groups Dysarthria Pre-lingual and severe hearing impairment Cleft lip and palate and velopharyngeal incompetence
Changing speech patterns Accurate and consistent feedback Provide target speech patterns and feed back deviations Repeated practice Drawbacks of conventional therapy –feedback may become inconsistent –lack of time leads to lack of practice
OLT visual feedback (sub-word)
Current systems IBM Speech Viewer Indiana Speech Training Aid Video Voice Speech Training System Speech Rehabilitation Speech Training Aid for the Hearing Impaired (HARP)
Desirable features should provide a contrastive visual training ie the correct model of a reference speaker and the deviant production of the client should be shown simultaneously to be compared with each other the visual pattern must be –attractive –easily comprehensible –shown without delay
OLP features Ability to contrast desired target and undesirable utterances (on same display) Mapping of articulatory information to 2D visual display in real-time Motivating displays and exercises Flexibility - can be individualised by therapist Can be used at home by client