Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019.

Similar presentations


Presentation on theme: "Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019."— Presentation transcript:

1 Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019

2 Outline Peaks – A system for the evaluation of pathologic speech
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders  eye tracking & bio signals Facial paralysis  facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary

3 Cleft Lip and Palate Structural malformations of the nose the throat
Towards Multimodal Evaluation of Speech Pathologies Cleft Lip and Palate Structural malformations of the nose the throat the mouth the jaw Negative effects on the respiration the nutrition the hearing the speaking the psychosocial competence Prevalence: 1 :

4 Laryngectomees Removal of the larynx due to cancer
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees Removal of the larynx due to cancer Breathing is detoured through the tracheostoma

5 Laryngectomees Removal of the larynx due to cancer
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees Removal of the larynx due to cancer Breathing is detoured through the tracheostoma Speaking is enabled by a substitute voice

6 Towards Multimodal Evaluation of Speech Pathologies
Motivation Problem: There is no objective validated method to measure the intelligibility reliably In clinical practice: subjective evaluation only Solution: Application of an automatic speech recognition system (ASR) to assess the intelligibility

7 Approach Recording of the speech data:
Towards Multimodal Evaluation of Speech Pathologies Approach Recording of the speech data: Client PC with unknown operating system Different Tests for different patients Automatic analysis of the speech data on a server system A few minutes after the recording: An automatically generated report is available

8 Architecture Towards Multimodal Evaluation of Speech Pathologies
client server audio- data recording feature- extraction MFCC secure transmission audio- data speech analysis speech recognition secure transmission speech features recognized word-chain report report scoring

9 Subjective Evaluation
Towards Multimodal Evaluation of Speech Pathologies Subjective Evaluation Evaluation of the audio data by speech experts On a scale from 1 to 5 For each turn Averaging for each speaker leads to a continuous scale from 1 to 5

10 Speech Intelligibility (children)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (children) Zwei Lautsprecher links oben und rechts unten

11 Speech Intelligibility (adults)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (adults) Zwei Lautsprecher links oben und rechts unten

12 Speech Intelligibility (general)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (general) Pathology Correlation CLP children 0.89 Laryngectomees 0.88 Oral Cancer 0.93 Dysarthric Speakers 0.90 Zwei Lautsprecher links oben und rechts unten

13 Outline Examples, where multimodality is important
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders  eye tracking & bio signals Facial paralysis  facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary

14 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

15 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

16 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

17 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

18 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

19 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

20 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Emotional disorders

21 Outline 3D information using Time-of-Flight camera
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders  eye tracking & bio signals Facial paralysis  facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary

22 3D Camera: Principle

23 Time-of-Flight (ToF) 3D Camera
up to 50 Hz more than 25k 3D points (176*144 pixels) eye-safe infrared light / no exposure

24 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis

25 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis

26 Towards Multimodal Evaluation of Speech Pathologies
Need for Multimodality: Facial Paresis

27 Real-time transmission of multimodal data
Towards Multimodal Evaluation of Speech Pathologies Outline Peaks – A system for the evaluation of pathologic speech Examples, where multimodality is important Emotional disorders  eye tracking & bio signals Facial paralysis  facial expression 3D information using Time-of-Flight camera Real-time transmission of multimodal data Outlook & summary

28  Real-time transmission of multimodal data
Towards Multimodal Evaluation of Speech Pathologies Real-Time Transmission for Telemedicine In many cases complex disease pattern Need for specially trained therapists Reduced mobility of patient  Telemedical treatment  Real-time transmission of multimodal data

29 Towards Multimodal Evaluation of Speech Pathologies
Telemedicine Secure transmission Sufficient bandwidth Video streaming with Open Source software FFmpeg (

30 Towards Multimodal Evaluation of Speech Pathologies
MPEG: YUV - Coding Y U V

31 Towards Multimodal Evaluation of Speech Pathologies
MPEG: YUV - Coding Y Y: 8 bit / pixel U: 8 bit / 4 pixels V: 8 bit / 4 pixels YUV: 12 bit / pixel V U

32 Towards Multimodal Evaluation of Speech Pathologies
Video Information to be Transmitted ≈ 15 frames/second currently pixels/frame (176*144), next version: pixels/frame (204*204) Per pixel: amplitude currently ignored depth encoded with 8 bit and transmitted in the Y channel of YUV-coding XYZ coordinates ignored, can be recovered from depth & camera parameters U & V channels (4 bit/pixel) transmitted but currently ignored (can be used to transmit amplitude or to improve depth resolution) 15*176*144*12 bit/second + audio ≈ 0,66 MByte/second

33 Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Speed: FFmpeg transmission of 3D video + audio (mp3) in real-time (15 frames, depth only, 44.1 kHz mono)  < 50 kByte/second  can be done via standard DSL Accuracy: depends on range; here: minimum distance = 50 cm  range = maximum distance – 50  range quantized with 256 steps (limit to 8 Bit Y channel) mpeg compression adds additional error  error measured after mpeg encoding/decoding software based averaging over 5 frames Mp3: 128 kBit/s = 16 kByte

34 Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Mp3: 128 kBit/s = 16 kByte

35 Towards Multimodal Evaluation of Speech Pathologies
Experimental Results Original Range: 35 cm, error: 1.6 mm Mp3: 128 kBit/s = 16 kByte Range: 50 cm, error: 2.2 mm Range: 90 cm, error: 3.7 mm

36 Outlook 3D image has low resolution but high resolution depth map
Towards Multimodal Evaluation of Speech Pathologies Outlook 3D image has low resolution but high resolution depth map Registration of low resolution 3D with high resolution 2D  high quality videos for real-time telemedical therapy Localization of eyes, mouth, etc. in 3D images is fast and less error prone than in 2D image  improved symmetry features for therapy diagnosis Implementation of real-time prototypes for Audio + 3D-TOF + 2D webcam Audio + eye tracking + biosignals for telemedical and biofeedback therapy

37 Summary Peaks: A system for the evaluation of pathologic speech
Towards Multimodal Evaluation of Speech Pathologies Summary Peaks: A system for the evaluation of pathologic speech Offline, audio only, tested on different pathologies Examples, where multimodality is important Emotional disorders  eye tracking & bio signals Facial paralysis  facial expression in 3D 3D information using Time-of-Flight camera Real-time transmission of multimodal data Standard video streaming and information reduction  acceptable quality 3D images with standard DSL Outlook: Registration of 3D with 2D image  high quality visualization, error free symmetry features

38 Thank you for your kind attention
Towards Multimodal Evaluation of Speech Pathologies Thank you for your kind attention Supported by Deutsche Forschungsgemeinschaft (DFG) Deutsche Krebshilfe (German Cancer Aid)


Download ppt "Elmar Nöth, Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies Friday, 13 September 2019."

Similar presentations


Ads by Google