Dan Jurafsky Lecture 8: Medical Applications: Intoxication, Depression, Trauma, Alzheimers, General Medical Health CS 424P/ LINGUIST 287 Extracting Social.

Slides:



Advertisements
Similar presentations
Point of View Dr. Karen Petit.
Advertisements

On-Demand Writing Assessment
Progress Monitoring. Progress Monitoring Steps  Monitor the intervention’s progress as directed by individual student’s RtI plan  Establish a baseline.
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Literacy Test Preparation
Maternal Psychological Control: Links to Close Friendship and Depression in Early Adolescence Heather L. Tencer Jessica R. Meyer Felicia D. Hall University.
Your Memory At Work Chapter 14. Pre-Reading! We are going to do 2 memory tests.
SQ3R: A Reading Technique
How to Develop a Science Fair Project
C ONVENTIONS : Style & Usage in the Sciences S PEAK W RITE.
Start Let’s a r i o t s ur hing eading.
An Introduction to Latent Semantic Analysis
1 Measuring maturity Richard Hudson Institute of Education, London July 2009.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Listening Task Purpose of the test:
Test Taking Tips How to help yourself with multiple choice and short answer questions for reading selections A. Caldwell.
Listening and Reading Tests
Assessing Reading: Meeting Year 3 Expectations
1 Curriculum planning Teaching & learning Assessment S1 Curriculum planning Assessment Strategies at teaching and learning level  Setting clear teaching.
Automated Essay Evaluation Martin Angert Rachel Drossman.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Positive Emotion in Language Production: Age Differences in Emotional Valence of Stories Elise Rosa and Deborah Burke Pomona College The Linguistic Inquiry.
DATA DETECTIVES TWSSP Tuesday. Agenda for today Distinguishing Distributions Old Faithful – Data Detectives Activity Head Measurement Hand Span Measurement.
Eric Cohen Books 2007 Simply Writing - Task to Project to Bagrut What's expected and what you can do! Clarity, Expectation and Format.
American Literature Kasi, Feroze Qaiser. Introduction to Thematic Unit Unit Theme : American/ English Literature Target Students : EFL College and adults.
Lecture 6 Verb and verb phrase
Dan Jurafsky and Chris Potts Lecture 10: Wrap-up CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment.
© British Council, All rights reserved. Language Awareness in the Primary Classroom An ELIS WSA-EC course, under licence from British Council Session.
Discourse Topics, Linguistics, and Language Teaching Richard Watson Todd King Mongkut’s University of Technology Thonburi arts.kmutt.ac.th/crs/research/
What is a reflection? serious thought or consideration the fixing of the mind on some subject;
What is Language? Education 388 Lecture 3 January 23, 2008 Kenji Hakuta, Professor.
© 2014 wheresjenny.com CEFR EVALUATION TEST CECR ENGLISH EVALUATION TEST 9 Common European Framework of Reference for Languages.
Speech Perception 4/4/00.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Early Behaviours and What to Look For EARLY READING BEHAVIOURS…
ESL Teacher Networking Meeting Session - 2 Raynel Shepard, Ed.D.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Helpful Hints for writing an exam commentary or essay Remember that unlike your oral commentary, a written commentary is NOT chronological; you DON ’ T.
Tears of a Tiger by Sharon M. Draper Sharon Draper wrote Tears of a Tiger in Hazelwood High Trilogy – Tears of a Tiger – Forged by Fire – Darkness.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
CEFR/CECR EVALUATION TEST © 2015 albert-learning.com CECR ENGLISH EVALUATION TEST 9 Common European Framework of Reference for Languages.
By: Mrs. Abdallah. The way we taught students in the past simply does not prepare them for the higher demands of college and careers today and in the.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
TOP TIPS for the Higher Language Paper Preparation and Exam Technique are the Key to Success.
COLLEGE ADMISSIONS ESSAY. WHY IS IT IMPORTANT? “…IT IS WHERE APPLICANTS ARE ABLE TO REVEAL THE THOUGHTFUL SIDE OF THEMSELVES WHICH ONLY THEY CAN SPEAK.
Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.
Trimester st Student has achieved reading success at level C or below Student has achieved reading success at level D or E. Student has achieved.
TYPE OF READINGS.
Key Stage 1 National Curriculum Assessments Information and Guidance on the Changes and Expectations for 2015/16 A Presentation for Parents.
CHEAM PARK FARM INFANTS SCHOOL Literacy Meeting. The Department for Education brought out a new National Curriculum for English which became statutory.
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
NOVEMBER 30, Announcements This week: Unit 25 and Unit 26 This Wednesday: Listening Quiz This Thursday, Unit Test Next Tuesday- Final Exam.
 Hailey Maurer and Liya Zalaltdinova Lying Words: Predicting Deception From Linguistic Styles by Matthew L. Newman, James W. Pennebaker, Diane S. Berry.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
+ PARCC Partnership for Assessment of Readiness for College and Careers.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
GCSE Spanish Monday 20 th and Monday 27 th January 2014 GCSE SPANISH WRITING Support Days.
25 minutes long Must write in pencil Off topic or illegible score will receive a 0 Essay must reflect your original and individual work.
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
How to Develop a Science Fair Project
Writing your reflection in Stage 1 & 2 Indonesian (continuers)
May 26, 2005: Empiricism versus Rationalism in Language Learning
Latent Semantic Analysis
Presentation transcript:

Dan Jurafsky Lecture 8: Medical Applications: Intoxication, Depression, Trauma, Alzheimers, General Medical Health CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment

Topic 1: Intoxication

Hollien et al 2001 Methods: 35 young adults, 19 males, 16 females given series of doses of alcohol speech collected at 4 BAC stages Rainbow passage difficult words (buttercup, shapupie) extemp speech (“Tell us about your favorite TV program) head-mounted mikes Investigated: F0 mean and variance duration/rate of speech intensity disfluencies

Hollien et al 2001 Results: F0

Hollien et al 2001 Results: Duration

Hollien et al 2001 Results: Disfluencies

Hollien et al 2001 Results: Magnitudes

Hollien et al 2001 Results: Speaker Specific Effects What did they find?

A famous case study Johnson, K., Pisoni, D. & Bernacki, R. (1990) Do voice recordings reveal whether a person is intoxicated?: A case study. Phonetica. 47:

Exxon Valdez

Was Captain Hazelwood drunk? Not clear if this is relevant, since he was asleep below deck The third mate was in charge of the wheelhouse the ship’s radar was broken But is a well-studied case

Johnson et al examined 3 kinds of cues Segmental Effects Disfluencies Suprasegmental Effects

Keith Johnsons /s/ and/ʃ/

/ ʃ /: Captain Hazelwood

Duration

F0

Summary

Questions Johnson et al. examined various possible causes. What other kinds of speaker state could cause drop in F0, slower speech, and disfluencies?

New Corpus! Alcohol Language Corpus Florian Schiel et al 2009, muenchen.de/forschung/Bas/BasALCeng.html muenchen.de/forschung/Bas/BasALCeng.html 124 speakers, 11,160 recordings recorded in a car (sometimes with engine running) tonguetwisters command and control speech (“turn off the radio”) spontaneous dialogue and monologue sample, drunk: sample, sober:

Automatic Classification Use of prosodic speech characteristics for automated detection of alcohol intoxication Michael Levit, Richard Huber, Anton Batliner, Elmar Noeth Break utterance into phrases automatically, based on fundamental frequency (where possible); zero-crossing rate energy

Then use 4 classes of features Prosodic F0 max, F0 min, energy max, energy min, pause length Duration of voiced regions, unvoiced regions, etc. Jitter and shimmer Average cepstrum and cepstral slope

Methods Alcoholized speech samples collected at the Police Academy of Hessen, Germany 120 readings (87 minutes) of a fable 33 male speakers BAC between 0 and.24/mille Binary task: above or below 0.8/mille leave-one-out cross-validation neural net classifier

Results of Levit et al. Used dev set to find best classifier This used two feature classes: Prosodic features Jitter/shimmer Results with this classifier 62% phrase-accuracy 69% for the whole speech sample voting of the phrases

Automatic detection features in the Bavarian corpus Humans: 62%-75% Machine: features used to date: F0 duration rhythm (correlated with duration but doesn’t require word transcripts) formants (f1 mean and F4 variance) Future work!!! disfluencies other segmental features: s versus sh but Schiel finding: more hyperarticulation in vowels in women in their corpus

Topic 2: Depression

Stirman and Pennebaker Suicidal poets 300 poems from early, middle, late periods of 9 suicidal poets 9 non-suicidal poets

Stirman and Pennebaker: 2 models Durkheim disengagement model: suicidal individual has failed to integrate into society sufficiently, is detached from social life detach from the source of their pain, withdraw from social relationships, become more self-oriented prediction: more self-reference, less group references Hopelessness model: Suicide takes place during extended periods of sadness and desperation, pervasive feelings of helplessness, thoughts of death prediction: more negative emotion, fewer positive, more refs to death

Methods 156 poems from 9 poets who committed suicide published, well-known in English have written within 1 year of commmiting suicide Control poets matched for nationality, education, sex, era.

The poets

Stirman and Pennebaker: Results

Significant factors Disengagement theory I, me, mine we, our, ours Hopelessness theory death, grave Other sexual words (lust, breast)

Rude et al: Language use of depressed and depression-vulnerable college students Beck (1967) cognitive theory of depression depression-prone individuals see the world and tehmselves in pervasively engative terms Pyszynski and Greenberg (1987) think about themselves after the loss of a central source of self-worth, unable to exit a self-regulatory cycle concerned with efforts to regain what was lost. results in self-focus, self-blame Durkheim social integration/disengagement perception of self as not integrated into society is key to suicidality and possibly depression

Methods College freshmen 31 currently-depressed (standard inventories) 26 formerly-depressed 67 never-depressed Session 1: take depression inventory Session 2: write essay please describe your deepest thoughts and feelings about being in college… write continuously off the top of your head. Don’t worry about grammar or spelling. Just write continuously.

Results depressed used more “I,me” than never-depressed turned out to be only “I” and used more negative emotional words not enough “we” to check Durkheim model formerly depressed participants used more “I” in the last third of the essay

Ramirez-Esparza et al: Depression in English and Spanish Study 1: Use LIWC counts on posts from 320 English and Spanish forums 80 posts each from depression forums in English and Spanish 80 control posts each from breast cancer forums Run the following LIWC categories I we negative emotion positive emotion

Results of Study 1

Conclusions?

Study 2 From depression forums: 404 English posts 404 Spanish posts Create a term by document matrix of content words 200 most frequent content words Do a factor analysis dimensionality reduction in term-document matrix Used 5 factors

English Factors a

Spanish Factors a

Implications? Problems? New applications?

Topic 3: Trauma

Cohn, Mehl, Pennebaker: Linguistic Markers of Psychology Change Surrounding September 11, LiveJournal users all blog entries for 2 months before and after 9/11 Lumped prior two months into one “baseline” corpus. Investigated changes after 9/11 compared to that baseline Using LIWC categories

Variables examined Emotional positivity difference between LIWC scores for positive emotion words (happy, good, nice) and negative emotion words (kill, ugly, guilty). cognitive processing think, question, because: concerned with organizing and intellectually understanding issues social orientation talk, share, friends and personal pronouns besides I/me. (essentially counts # of references to other people)

Last factor: Psychological Distancing psychological distancing factor-analytic: + articles, + words > 6 letters long - I/me/mine - would/should/could - present tense verbs low score = personal, experiential lg, focus on here and now high score: abstract, impersonal, rational tone

Results

Implications? Methodological problems? Ideas for exciting new studies?

Topic 4: Alzheimers

The Nun Study Linguistic Ability in Early Life and the Neuropathology of Alzheimer’s Disease and Cerebrovascular Disease: Findings from the Nun Study D.A. SNOWDON, L.H. GREINER, AND W.R. MARKESBERY The Nun Study: a longitudinal study of aging and Alzheimer’s disease Cognitive and physical function assessed annually All participants agreed to brain donation at death At the first exam given between 1991 and 1993, the 678 participants were 75 to 102 years old. This study: subset of 74 participants for whom we had handwritten autobiographies from early life, all of whom had died.

The data In September 1930 leader of the School Sisters of Notre Dame religious congregation requested each sister write “a short sketch of her own life. This account should not contain more than two to three hundred words and should be written on a single sheet of paper... include the place of birth, parentage, interesting and edifying events of one's childhood, schools attended, influences that led to the convent, religious life, and its outstanding events.” Handwritten diaries found in two participating convents, Baltimore and Milwaukee

The linguistic analysis Grammatical complexity Developmental Level metric (Cheung/Kemper) sentences classified from 0 (simple one-clause sentences) to 7 (complex sentences with multiple embedding and subordination) Idea density: average number of ideas expressed per 10 words. elementary propositions, typically verb, adjective, adverb, or prepositional phrase. Complex propositions that stated or inferred causal, temporal, or other relationships between ideas also were counted. Prior studies suggest: idea density is associated with educational level, vocabulary, and general knowledge grammatical complexity is associated with working memory, performance on speeded tasks, and writing skill.

Idea density “I was born in Eau Claire, Wis., on May 24, 1913 and was baptized in St. James Church.” (1) I was born, (2) born in Eau Claire, Wis., (3) born on May 24, 1913, (4) I was baptized, (5) was baptized in church (6) was baptized in St. James Church, (7) I was born...and was baptized. There are 18 words or utterances in that sentence. The idea density for that sentence was 3.9 (7/18 * 10 = 3.9 ideas per 10 words).

Results correlation between neuropatholocially defined Alzheimers desiease had lower idea desnity socres than thnon-Alzheimers Correlations between idea density scores and mean neurofibrillary tangle counts −0.59 for the frontal lobe, −0.48 for the temporal lobe, −0.49 for the parietal lobe

Explanations? Early studies found same results with a college- education subset of the population who were teachers, suggesting education was not the key factor They suggest: Low linguistic ability in early life may reflect suboptimal neurological and cognitive development which might increase susceptibility to the development of Alzheimer’s disease pathology in late life

Garrod et al British writer Iris Murdoch last novel published 1995, Diagnosed with Alzheimers 1997 Compared three novels Under the Net (first) The Sea (in her prime) Jackson's Dilemma (final novel) All her books written in longhand with little editing

Type to token ratio in the 3 novels

Syntactic Complexity

Mean proportions of usages of the 10 most frequently occurring words in each book that appear twice within a series of short intervals, ranging from consecutive positions in the text to a separation of three intervening words. Garrard P et al. Brain 2005;128: Brain Vol. 128 No. 2 © Guarantors of Brain 2004; all rights reserved

Parts of speech

Comparative distributions of values of: (A) frequency and (B) word length in the three books. Garrard P et al. Brain 2005;128: Brain Vol. 128 No. 2 © Guarantors of Brain 2004; all rights reserved

From Under the Net, 1954 "So you may imagine how unhappy it makes me to have to cool my heels at Newhaven, waiting for the trains to run again, and with the smell of France still fresh in my nostrils. On this occasion, too, the bottles of cognac, which I always smuggle, had been taken from me by the Customs, so that when closing time came I was utterly abandoned to the torments of a morbid self-scrutiny.” From Jackson's Dilemma, 1995 "His beautiful mother had died of cancer when he was 10. He had seen her die. When he heard his father's sobs he knew. When he was 18, his younger brother was drowned. He had no other siblings. He loved his mother and his brother passionately. He had not got on with his father. His father, who was rich and played at being an architect, wanted Edward to be an architect too. Edward did not want to be an architect."

Lancashire and Hirst Vocabulary Changes in Agatha Christie’s Mysteries as an Indication of Dementia: A Case Study Ian Lancashire and Graeme Hirst 2009

Vocabulary Changes in Agatha Christie’s Mysteries as an Indication of Dementia: A Case Study Ian Lancashire and Graeme Hirst 2009 Examined all of Agatha Christie’s novels Features: Nicholas, M., Obler, L. K., Albert, M. L., Helm-Estabrooks, N. (1985). Empty speech in Alzheimer’s disease and fluent aphasia. Journal of Speech and Hearing Research, 28: 405–10. Number of unique word types Number of different repeated n-grams up to 5 Number of occurences of “thing”, “anything”, and “something”

Results

Topic 5: Writing and physical health People asked to write about traumatic experiences subsequently exhibit better physical health than people asked to write about superficial topics Intuition: people who write about emotional topics report that the experiment makes them think differently about their experience. Hypothesis: Do changes in writing style correlate with improved health? Could we find these changes automatically?

Singular Value Decomposition Singular Value Decomposition (SVD) is a form of factor analysis Any m  n matrix A can be written using an SVD of the form A = UDV T where: U is an m  n matrix (a ‘hanger’ matrix) D is an n  n diagonal matrix (a ‘stretcher’ matrix) V T is an n  n matrix (an ‘aligner’ matrix) (see

Application of SVD to LSA Assemble a large corpus of natural language Parse corpus into meaningful passages Form matrix with passages as rows and words as columns SVD applied to re-represent the words and passages as vectors in a high-dimensional ‘semantic space’

SVD: an example (1) Titles of Technical Memos c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey

LSA This example is taken from: Deerwester, S.,Dumais, S.T., Landauer, T.K.,Furnas, G.W. and Harshman, R.A. (1990). "Indexing by latent semantic analysis." Journal of the Society for Information Science, 41(6), Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example – 2 r (human.user) = -.38r (human.minors) = -.29 Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example – 3 Singular Value Decomposition {A}={U}{ S }{V} T Dimension Reduction {~A}~={~U}{~ S }{~V} T

A Small Example – 4 {U} = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example – 5 { S } = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example – 6 {V} = Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

A Small Example – 7 r (human.user) =.94r (human.minors) = -.83

A Small Example – 2 reprise r (human.user) = -.38r (human.minors) = -.29 Slides are from a presentation by Tom Landauer and Peter Foltz, adapted by Melanie Martin

Pennebaker results Pronouns: I, my, it, you, me, she, he, her, we, they, your, him, his, them, our, myself, their, us, its

Implications?