Download presentation
Presentation is loading. Please wait.
Published byLionel St-Arnaud Modified over 5 years ago
1
Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019
2
Outline Peaks – A system for the evaluation of pathologic speech
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders eye tracking & bio signals Facial paralysis facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary
3
Cleft Lip and Palate Structural malformations of the nose the throat
Towards Multimodal Evaluation of Speech Pathologies Cleft Lip and Palate Structural malformations of the nose the throat the mouth the jaw Negative effects on the respiration the nutrition the hearing the speaking the psychosocial competence Prevalence: 1 :
4
Laryngectomees Removal of the larynx due to cancer
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees Removal of the larynx due to cancer Breathing is detoured through the tracheostoma
5
Laryngectomees Removal of the larynx due to cancer
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees Removal of the larynx due to cancer Breathing is detoured through the tracheostoma Speaking is enabled by a substitute voice
6
Towards Multimodal Evaluation of Speech Pathologies
Motivation Problem: There is no objective validated method to measure the intelligibility reliably In clinical practice: subjective evaluation only Solution: Application of an automatic speech recognition system (ASR) to assess the intelligibility
7
Approach Recording of the speech data:
Towards Multimodal Evaluation of Speech Pathologies Approach Recording of the speech data: Client PC with unknown operating system Different Tests for different patients Automatic analysis of the speech data on a server system A few minutes after the recording: An automatically generated report is available
8
Architecture Towards Multimodal Evaluation of Speech Pathologies
client server audio- data recording feature- extraction MFCC secure transmission audio- data speech analysis speech recognition secure transmission speech features recognized word-chain report report scoring
9
Subjective Evaluation
Towards Multimodal Evaluation of Speech Pathologies Subjective Evaluation Evaluation of the audio data by speech experts On a scale from 1 to 5 For each turn Averaging for each speaker leads to a continuous scale from 1 to 5
10
Speech Intelligibility (children)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (children) Zwei Lautsprecher links oben und rechts unten
11
Speech Intelligibility (adults)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (adults) Zwei Lautsprecher links oben und rechts unten
12
Speech Intelligibility (general)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (general) Pathology Correlation CLP children 0.89 Laryngectomees 0.88 Oral Cancer 0.93 Dysarthric Speakers 0.90 Zwei Lautsprecher links oben und rechts unten
13
Outline Examples, where multimodality is important
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders eye tracking & bio signals Facial paralysis facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary
14
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
15
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
16
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
17
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
18
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
19
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
20
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders
21
Outline 3D information using Time-of-Flight camera
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders eye tracking & bio signals Facial paralysis facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary
22
3D Camera: Principle
23
Time-of-Flight (ToF) 3D Camera
up to 50 Hz more than 25k 3D points (176*144 pixels) eye-safe infrared light / no exposure
24
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis
25
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis
26
Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis
27
Real-time transmission of multimodal data
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders eye tracking & bio signals Facial paralysis facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary
28
Real-time transmission of multimodal data
Towards Multimodal Evaluation of Speech Pathologies Real-Time Transmission for Telemedicine In many cases complex disease pattern Need for specially trained therapists Reduced mobility of patient Telemedical treatment Real-time transmission of multimodal data
29
Towards Multimodal Evaluation of Speech Pathologies
Telemedicine Secure transmission Sufficient bandwidth Video streaming with Open Source software FFmpeg (
30
Towards Multimodal Evaluation of Speech Pathologies
MPEG: YUV - Coding Y U V
31
Towards Multimodal Evaluation of Speech Pathologies
MPEG: YUV - Coding Y Y: 8 bit / pixel U: 8 bit / 4 pixels V: 8 bit / 4 pixels YUV: 12 bit / pixel V U
32
Towards Multimodal Evaluation of Speech Pathologies
Video Information to be Transmitted ≈ 15 frames/second currently pixels/frame (176*144), next version: pixels/frame (204*204) Per pixel: amplitude currently ignored depth encoded with 8 bit and transmitted in the Y channel of YUV-coding XYZ coordinates ignored, can be recovered from depth & camera parameters U & V channels (4 bit/pixel) transmitted but currently ignored (can be used to transmit amplitude or to improve depth resolution) 15*176*144*12 bit/second + audio ≈ 0,66 MByte/second
33
Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Speed: FFmpeg transmission of 3D video + audio (mp3) in real-time (15 frames, depth only, 44.1 kHz mono) < 50 kByte/second can be done via standard DSL Accuracy: depends on range; here: minimum distance = 50 cm range = maximum distance – 50 range quantized with 256 steps (limit to 8 Bit Y channel) mpeg compression adds additional error error measured after mpeg encoding/decoding software based averaging over 5 frames Mp3: 128 kBit/s = 16 kByte
34
Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Mp3: 128 kBit/s = 16 kByte
35
Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Original Range: 35 cm, error: 1.6 mm Mp3: 128 kBit/s = 16 kByte Range: 50 cm, error: 2.2 mm Range: 90 cm, error: 3.7 mm
36
Outlook 3D image has low resolution but high resolution depth map
Towards Multimodal Evaluation of Speech Pathologies Outlook 3D image has low resolution but high resolution depth map Registration of low resolution 3D with high resolution 2D high quality videos for real-time telemedical therapy Localization of eyes, mouth, etc. in 3D images is fast and less error prone than in 2D image improved symmetry features for therapy diagnosis Implementation of real-time prototypes for Audio + 3D-TOF + 2D webcam Audio + eye tracking + biosignals for telemedical and biofeedback therapy
37
Summary Peaks: A system for the evaluation of pathologic speech
Towards Multimodal Evaluation of Speech Pathologies Summary Peaks: A system for the evaluation of pathologic speech Offline, audio only, tested on different pathologies Examples, where multimodality is important Emotional disorders eye tracking & bio signals Facial paralysis facial expression in 3D 3D information using Time-of-Flight camera Real-time transmission of multimodal data Standard video streaming and information reduction acceptable quality 3D images with standard DSL Outlook: Registration of 3D with 2D image high quality visualization, error free symmetry features
38
Thank you for your kind attention
Towards Multimodal Evaluation of Speech Pathologies Thank you for your kind attention Supported by Deutsche Forschungsgemeinschaft (DFG) Deutsche Krebshilfe (German Cancer Aid)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.