ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

Slides:



Advertisements
Similar presentations
1 Speech Sounds Introduction to Linguistics for Computational Linguists.
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Accessing spoken words: the importance of word onsets
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Neural networks Introduction Fitting neural networks
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
II. PHONOLOGY             .
Katholieke Universiteit Leuven - ESAT, BELGIUM The SPACE project: Speech Algorithms for Clinical and Educational Applications Hugo Van hamme SPACE symposium.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.
Chapter 10 Decision Making © 2013 by Nelson Education.
Assessing Speech Intelligibility and Severity
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Articulation and Phonology 1 Articulation: Ability to produce sounds in sequence by the moving articulators. Phonology: Rules that govern how phonemes.
Why an objective intelligibility assessment ? Catherine Middag Jean-Pierre Martens Gwen Van Nuffelen Marc De Bodt.
GABRIELLA RUIZ LING 620 OHIO UNIVERSITY Cross-language perceptual assimilation of French and German front rounded vowels by novice American listeners and.
Why is ASR Hard? Natural speech is continuous
Linguistics 341: Introduction to Phonetics Steve Winters, Instructor Jacqueline Jones, Teaching Assistant Science A 247 MWF 1:00-1:50.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
CSD 2230 HUMAN COMMUNICATION DISORDERS
Introduction to Automatic Speech Recognition
New technologies supporting people with severe speech disorders Mark Hawley Barnsley District General Hospital and University of Sheffield.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Measuring and Assessing Severity of Involvement for Children with SSD Peter Flipsen Jr., PhD, S-LP(C), CCC-SLP Professor of Speech-Language Pathology Idaho.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Applied Speech Sciences 4/11/00. Speech Science Application Speech production via computers Forensics- criminal investigations; voice prints Assessing.
Korea Maritime and Ocean University NLP Jung Tae LEE
Speech Perception 4/4/00.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Copyright  2014 Pearson Education, Inc. or its affiliate(s). All rights reserved. Automatic Assessment of the Speech of Young English Learners Jian Cheng,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE symposium - 6/2/091 Language modelling (word FST) Operational model for categorizing mispronunciations.
Speech Science IX How is articulation organized? Version WS
Assessment of Phonology
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.
SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Big Ideas in Reading: Phonemic Awareness
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Language and Communication Definitions Developmental scales Communication disorders Speech Disorders Language Disorders Interventions.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Types and Methods of Practice.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
© 2013 by Larson Technical Services
YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,
Predicting Voice Elicited Emotions
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Phonetic features in ASR Kurzvortrag Institut für Kommunikationsforschung und Phonetik Bonn 17. Juni 1999 Jacques Koreman Institute of Phonetics University.
Katarina Haley, Ph.D., CCC-SLP Associate Professor Division of Speech and Hearing Sciences Department of Allied Health Sciences UNC-CH School of Medicine.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
The Core of Linguistics. Phonetics Speech sounds are produced by human beings. Then transmitted through the medium of air in the form of sound waves,
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Speech Audiometry Lecture 8.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Using Speech Recognition to Predict VoIP Quality
G. Anushiya Rachel Project Officer
Speech and Singing Voice Enhancement via DNN
Linguistic knowledge for Speech recognition
Mr. Darko Pekar, Speech Morphing Inc.
Analysis and Interpretation
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Automatic Fluency Assessment
Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019.
Presentation transcript:

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/092 Introduction Intelligibility = popular measure for pathological speech assessment Perceptual assessment affected by non-speech information : –familiarity with speaker and type of disorder –usage of linguistic context Word intelligibility tests designed to eliminate bias due to linguistic context Replacing the human listener by an automatic speech recognizer (ASR) can solve the other problems, but is the ASR sufficiently reliable? –test case : automation of the Dutch Intelligibility Assessment (DIA)

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/093 top Dutch Intelligibility Assessment (DIA) 50 isolated CVC words intelligibility = percent phonemes correct

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/094 How to apply ASR in the DIA? Two approaches –let ASR recognize the words and count the percentage of correct decisions –let ASR check how well the acoustics match with the phonetic transcription of the target word (=alignment) Our experience –intelligibility emerging from first approach insufficiently reliable –therefore we developed a system based on alignment

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/095 System architecture : flow chart Speech aligner speaker features Intelligibility Prediction Model objective score acoustic feature sequence X t target speech transcription

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/096 System architecture : flow chart Speech aligner speaker features Intelligibility Prediction Model objective score acoustic feature sequence X t target speech transcription Two systems: complex state-of-the-art HMM-based system (ASR-ESAT) simple system with phonological layer (ASR-ELIS) (point more directly to articulatory problems)

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/097 System architecture : flow chart Speech aligner acoustic feature sequence X t target speech transcription Intelligibility Prediction Model objective score speaker features Two feature sets: Phonemic features (patient has trouble pronouncing a certain phoneme) Phonological features (patient has problems with voicing, manner or place of articulation)

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/098 Extraction of phonemic features (PMF) # : ( ) /3 /p/ : ( ) /2 /o/: ( ) /2 /l/: 0.6 Speech aligner = ASR-ESAT Phonemic features FramePhonemeP(s t |X t ) 1#0.7 2#0.5 3/p/0.4 4/p/0.8 5/o/0.6 6/o/0.8 7/l/0.6 8#0.3

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/099 Extraction of phonological features (PLF) FramePhone voiced P(K 1 |X t ) back P(K 2 |X t ) burst P(K 3 |X t ) 1# #0.1 3/pcl/ /p/ /o/ /o/ /l/ # 0.0 Burst : 0.6 Back : ( )/2 Voiced : ( )/3 Speech aligner = ASR-ELIS Phonological features

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0910 Extraction of phonological features (PLF) Not burst : ( … Not back : ( … Not voiced : ( … Phonological features FramePhone voiced P(K 1 |X t ) back P(K 2 |X t ) burst P(K 3 |X t ) 1# #0.1 3/pcl/ /p/ /o/ /o/ /l/ # 0.0 Speech aligner = ASR-ELIS

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0911 Irrelevant features for these phones Extraction of phonological features (PLF) Phonological features FramePhone voiced P(K 1 |X t ) back P(K 2 |X t ) burst P(K 3 |X t ) 1# #0.1 3/pcl/ /p/ /o/ /o/ /l/ # 0.0 Speech aligner = ASR-ELIS

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0912 System architecture : flow chart Speech aligner acoustic feature sequence X t target speech transcription speaker features objective score Intelligibility Prediction Model

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0913 Intelligibility prediction model (IPM) Objective map speaker features (PMF, PLF or combinations) to speaker intelligibility score Model training –train on DIA recordings –pathological speakers (+ some normal control speakers) Model type and size –limited number of pathological speakers –high number of features  linear regression model  feature selection

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0914 Reference material (DIA) 211 speakers : –51 normals –60 dysarthric –12 clefts –42 hearing impaired –37 with laryngectomy – 7 with dysphonia – 2 others Pathological speakers : mean of 78,7 % Normals : mean of 93,3 % Few with very low score

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0915 Results : individual systems Based on five-fold cross validation Measure = Pearson Correlation Coefficient (PCC) ELIS : PLF : PCC = 0.78 ESAT : PMF : PCC = 0.80

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0916 Results : combined system PMF + PLF : PCC = 0.86

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0917 Results : pathology-specific IPM Instead of creating one general IPM, one can create IPMs for specific pathologies : –still trained on all speakers (enough speakers) –model selection based on performance of speakers of that pathology (importance of features depends on type of disorder) DysarthriaLaryngectomyHearing impairment PCC

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0918 Results : pathology-specific IPM Dysarthria : 0.94 (red circles) Dispersion of other speakers is increased Largest deviations in low intelligibility area : –scarce data in that area –can be solved by adding more weight to patients with very low intelligibility

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0919 Development of DIA-tool PMF and PLF can predict intelligibility of pathological speech: –Combining PMF and PLF yields high PCCs: 0.86 for general model over 0.91 for pathology specific model –PCCs for specific pathologies compete with subjective inter-rater agreements (0.91) This opens up possibilities for development of an automated version of the DIA (see demonstration later) based on PLF + PMF

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0920 New feature set : Context-dependent phonological features (CD-PLF) Until now: –PMF : Does the patient have trouble pronouncing a certain phoneme? –PLF : Does the patient have problems with voicing, manner or place of articulation New : Does the patient have problems with a desired change of voicing, manner or place of articulation?  CD-PLFs : how well is change in PLF realized?

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0921 Extraction of context-dependent phonological features (CD-PLF) SegmentPhone voicedburst… 2# /pcl/0.2 4/p/ /o/ /s/ # /m/ /A/ /l/ #0.1 CD-PLF features Speech aligner = ASR-ELIS voicingBurst Off, on, off : +0.6Yes, no, no : +0.1 On, on, on : +0.8No, no, no : +0.0

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0922 Results for CD-PLF CD-PLFs alone compete with previous best PLF+PMF : 0.86 CD-PLF+PMF : 0.90  new best! Pathology-specific results for CD-PLF+PMF : DysarthriaLaryngectomyHearing impairment PCC

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0923 Conclusions and future work PMF, PLF and CD-PLF can predict intelligibility of pathological speech –CD-PLFs seem to play an important role : CD-PLF : PCC = 0.87 CD-PLF + PMF : PCC=0.90  not the articulation pattern but the change in the articulation pattern matters? –More research is needed before adding this feature set to the tool High PCCs open up new possibilities for : –more profound articulatory assessment, which is directly related to determination of appropriate therapy –monitoring of effectiveness of chosen therapy  tool –using more natural speech (words, phrases) in tests

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0924 Questions?