Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield

Slides:



Advertisements
Similar presentations
An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers Athanassios Hatzis, Phil Green,
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Automatic Speech Recognition with Sparse Training Data for Dysarthric Speakers P. Green 1, J. Carmichael 1, A. Hatzis 1, P. Enderby 3, M. Hawley & M. Parker.
The Computerised FDA Application Formulating A System of Acoustic Objective Measures for the Frenchay Dysarthria Assessment Tests.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
SPandH Overview January 2010 Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Speaker Adaptation for Vowel Classification
Optimal Adaptation for Statistical Classifiers Xiao Li.
The OLP articulation program : a demonstration Rebecca Palmer, Pam Enderby, Mark Hawley Phil Green, Nassos Hatzis & James Carmichael European Commission.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.
Why is ASR Hard? Natural speech is continuous
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
New technologies supporting people with severe speech disorders Mark Hawley Barnsley District General Hospital and University of Sheffield.
Assistive Technology Marla Roll, MS, OTR December 15, 2010 Denver Options.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria Mark Parker Specialist Speech and Language Therapist.
7-Speech Recognition Speech Recognition Concepts
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
HOARSE Mid Term Review Coordinator’s Report Phil Green University of Sheffield, UK.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Introduction to Computational Linguistics
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
© 2013 by Larson Technical Services
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Lecture 1 Phonetics – the study of speech sounds
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
Natural Language Processing and Speech Enabled Applications
Mr. Darko Pekar, Speech Morphing Inc.
Automatic Speech Recognition
3.0 Map of Subject Areas.
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Kocaeli University Introduction to Engineering Applications
Indian Institute of Technology Bombay
Presentation transcript:

Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield

CAST December 2007 Talk Overview SPandH - Speech and Sheffield The CAST group Building Automatic Speech Recognisers – conventional methodology ASR for clients with speech disorders Kinematic Maps Voice-driven Environmental Control VIVOCA Customising Voices Future Directions

CAST December 2007 SPandH Phonetics & Linguistics Hearing & Acoustics Electrical Engineering & Signal Processing Speech & Language Therapy Auditory Scene Analysis Missing Data Theory Glimpsing CAST

Prof Mark Hawley School of Health and Related Research Assistive Technology Prof Pam Enderby Institute of General Practice and Primary Care University of Sheffield Speech Therapy Prof Phil Green Prof Roger K Moore Speech and Hearing Research Group Department of Computer Science University of Sheffield Speech Technology Dr Stuart Cunningham Department of Human Communication Sciences University of Sheffield Speech Perception, Speech Technology Contact:

CAST December 2007 Conventional Automatic Speech Recogniser Construction Standard technique uses generative statistical models: Each speech unit is modeled by an HMM with a number of states. Each state is characterised by a mixture Gaussian distribution over the components of the acoustic vector x. Parameters of the distributions estimated in training (EM – Baum-Welch) All this is the acoustic model. There will also be a language model. Decoding finds model & state sequence most likely to generate X. Training based on large pre-recorded speaker-independent speech corpus

CAST December 2007 Dysarthria Loss of control of speech articulators Stroke victims, cerebral palsy, MS.. Effects 170 per 100,000 population Severe cases unintelligible to strangers: Often accompanied by physical disability channel lamp radio

CAST December 2007 STARDUST: ASR for Dysarthric Speakers NHS NEAT Funding Environmental control Small vocabulary, isolated words Speaker-dependent Sparse training data Variable training data

CAST December 2007 STARDUST Methodology Initial recordings Train Recogniser Confusability Analysis Client Practice For Consistency New Recordings

CAST December 2007 STARDUST training results ClientSentence Intelligibilit y (%) Word Intelligibilit y (%) Vocabulary Size Pre- training (%) Post- training (%) CC PH GR JT KD MR FL ECS trial: halved the average time to execute a command

CAST December 2007 STARDUST Consistency Training

CAST December 2007 STARDUST Clinical Trial

CAST December 2007 OPTACIA: Kinematic Maps Pronunciation Training Aid EC Funding Speech acoustics mapped to x,y position in map window in real time Mapping by trained Neural Net Customise for exercises and clients ANN Mapping Signal Processing sh s i Speech

CAST December 2007 Example: Vowel Map

CAST December 2007 SPECS: Speech-Driven Environmental Control Systems NHS HTD Funding Industrial exploitation STARDUST on ‘balloon board’

CAST December 2007 VIVOCA- Voice Input Voice Output Communication Aid NHS NEAT funding Assists communication with strangers; Client: ‘buy tea’ [unintelligible] VIVOCA: ‘A cup of tea with milk and no sugar please’ [intelligible synthesised speech] Runs on a PDA Text Generation ASR Dysarthric speech Speech Synthesis Intelligible speech

CAST December 2007 Voices for VIVOCA It is possible to build voices from training data A local voice is preferable Yorkshire voices: Ian MacMillan Christa Ackroyd

CAST December 2007 Concatenative synthesis Input data Text input Synthesised speech Speech recordings Unit segmentation Unit database Unit selection Concatenation + smoothing i a sh Festvox: +… ++ …

CAST December 2007 Concatenative synthesis High quality Natural sounding Sounds like original speaker  Need a lot of data (~600 sentences)  Can be inconsistent  Difficult to manipulate prosody

CAST December 2007 HMM synthesis yes yes

CAST December 2007 HMM synthesis: adaptation Input data Text input Average speaker model Synthesised speech Speech recordings Training Synthesis e t HTS Adapted speaker model Adaptation e t Speech recordings

CAST December 2007 HMM synthesis Consistent Intelligible Easier to manipulate prosody Needs relatively little input for adaptation data (>5 sentences)  Less natural than concatenative

CAST December 2007 Personalisation for individuals with progressive speech disorders Voice banking Before deterioration Capturing the essence of a voice During deterioration

CAST December 2007 HMM synthesis: adaptation for dysarthric speech Input data Text input Average speaker model Synthesised speech Speech recordings Training Synthesis e t HTS Adapted speaker model Adaptation e t Speech recordings Duration, phonation and energy information

Helping people who have lost their voice Operations like laryngectomy remove the vocal cords completely If recordings have been made before the operation a synthetic voice can be reconstructed E.g (HMM synthesis, 7 minutes of poor quality adaptation data) CAST December 2007 OriginalSynthesised

REDRESS (with University of Hull) Small magnets placed on lips & tongue Changes in magnetic field detected This data can be used as the basis for speech recognition Demonstrated accurate results on 50 word vocabulary CAST December 2007

Future directions Personal Adaptive Listeners (PALS) ‘Home Service’ Companions

CAST December 2007 The PALS Concept A PAL is a portable (PDA, wearable..) device which you own Your PAL is like your valet It knows a lot about you.. The way you speak, the words you like to use Your interests, contacts, networks You talk with it The knowledge makes conversational dialogues viable It does things for you Bookings, appointments, reminders Communication Access to services.. It learns to do a better job By Automatic Adaptation: acoustic models, language models, dialogue models By explicit training (this is how I refer to things, these are the names I use..) USER-AS-TEACHER