Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.

Slides:



Advertisements
Similar presentations
Chapter 9 - Fluency Assessment
Advertisements

Acoustic/Prosodic Features
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Multipitch Tracking for Noisy Speech
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Fundamental Frequency & Jitter Lab 2. Fundamental Frequency Pitch is the perceptual correlate of F 0 Perception is not equivalent to measurement: –Pitch=
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Two Types of Listeners? Marie Nilsenov á (Tilburg University) 1. Background When you and I listen to the same utterance, we may not perceive the linguistic.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Speech and speaker normalization (in vowel normalization)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Anatomy of the vocal mechanism
Why an objective intelligibility assessment ? Catherine Middag Jean-Pierre Martens Gwen Van Nuffelen Marc De Bodt.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Sound and Speech. The vocal tract Figures from Graddol et al.
Learning Objectives Describe how speakers control frequency and amplitude of vocal fold vibration Describe psychophysical attributes of pitch, loudness.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Predicting Voice Elicited Emotions
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Budapest University for Technology and Economics Laboratory of Speech Acoustics Automatic detection.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Speechreading Based on Tye-Murray (1998) pp
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
Linguistic knowledge for Speech recognition
Automatic screening of Alzheimer's disease using speech recognition
Mr. Darko Pekar, Speech Morphing Inc.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Speech Perception CS4706.
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
EE513 Audio Signals and Systems
Emotional Speech Julia Hirschberg CS /16/2019.
Audio and Speech Computers & New Media.
Artificial Intelligence 2004 Speech & Natural Language Processing
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Presentation transcript:

Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr. Shirley Portuguese

Agenda Presenting the approach – Speech prosody – Mental states, mental pathologies, Schizophrenia Our work – Data – Methods – Results Future directions

Information in speech prosody Syntactic disambiguation from intonation Prosodic content as signal for mental state

Large variability wave formspectrogram

Estimating mental states

Previous works Voice analysis to detect cancer of the larynx (Murry, T. and E. Doherty 1980) Differences in speech between Schizophrenians and normals (e.g. Stassen, H., et al 1995)

Our work - Data Strict expressionFree expression Reading a list of words Reading a passage Free interview Dr. (M.D.) Shirley Portuguese Time (sec)

Our work - Methods Auditory signal processing Collecting acoustic features – Duration features – Variability features – Speech density features Time (sec)

Acoustic feature extraction Duration features: utterancegap Mean utterance duration Mean log utterance duration Mean gap duration Mean log gap duration

Acoustic feature extraction Variability features: Meso scale Pitch (f0) / period Variability Power variability Shimmer Jitter Micro scale

Results – acoustic features List of wordsFluent text During task mean utterance duration (sec) mean jitter score

Results - classification S vs. DC vs. DC vs. S List of words Passage Interview DepressionSchizophreniaControl List of words Passage Interview Sample sizes: Correct classification rates (LOU): Linear classification with SVM

Results – classification separated by gender S vs. DC vs. DC vs. S Both genders Male only Female only S vs. DC vs. DC vs. S Both genders Male only Female only List of words: Interview:

Results – Correlation SANS total (cc, p-val)Acoustic feature ( , )Utterance duration (0.4223, )Gap duration ( , )Spoken ratio ( , )Fragmented speech ( , )Emphasis ( , )Inflection HAM-DAcoustic feature (0.4661, )Utterance duration ( , )Inflection ( , )Jitter Schizophrenia: Depression:

Results - summary Possible to find signal to mental state in prosodic features Local and micro-scale features are discriminative Type of speech test influences discrimination Possibly signal is more prominent in females

Future directions Spectral features Using linguistic elements: phonemes, words Forced alignment and functional data analysis Language and speaker identification methods Personal directions: – Other signal domains: general sounds, music…

Thank you!