1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Chapter 4 – Reliability Observed Scores and True Scores Error
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 7 Using Nonexperimental Research.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Ability to attract and retain followers by virtue of personal characteristics - not traditional or political office (Weber ‘47) What makes an individual.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
One-way Between Groups Analysis of Variance
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
User Study Evaluation Human-Computer Interaction.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
The relationship between objective properties of speech and perceived pronunciation quality in read and spontaneous speech was examined. Read and spontaneous.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Using Measurement Scales to Build Marketing Effectiveness CHAPTER ten.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
National Taiwan University, Taiwan
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Using Nonexperimental Research.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction EuroSpeech 1997 Authors: Y. Kim, H. Franco, L. Neumeyer Presenter:
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Getting the most out of interactive and developmental data Daniel Messinger
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
Chapter 8 Introducing Inferential Statistics.
Effects of Reading on Word Learning
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Comparing Three or More Means
Introduction to Measurement
Studying Intonation Julia Hirschberg CS /21/2018.
Automatic Fluency Assessment
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
15.1 The Role of Statistics in the Research Process
F test for Lack of Fit The lack of fit test..
Presentation transcript:

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998

Spotting “Hot Spots” in Meetings: Human Judgments and Prosodic Cues - Britta Wrede, Elizabeth Shriberg Can human listeners agree on utterance-level judgments of speaker involvement? Do judgments of involvement correlate with automatically extractable prosodic cues? Why might this be useful for meetings?

Corpus ICSI Meeting Corpus –75 unscripted, naturally occurring meetings on scientific topics –71 hours of recording time –Each meeting contains between 3 and 9 participants –Pool of 53 unique speakers (13 female, 40 male) –Speakers recorded by both far field and individual close-talking microphones Recordings from the close-talking microphones were used here

Method Subset of 13 meetings (4-8 spkrs) selected Analyzed utterances for involvement: –Amusement, disagreement, other “Hot Spots” labeled: +1 spkr had high involvement –Labeled as amused, disagreeing, other –Why didn’t allow context? –Why use (9) people who know the spkrs? –Why ask them to base their judgment as much as possible on the acoustics? Inter-rater agreement measured using Fleiss’ Kappa for pair-wise and overall agreement

Inter-rater Agreement Cohen’s kappa: 2 raters, categorical dataCohen’s kappa

Fleiss’ kappa: generalizes Cohen’s to multiple raters, categorical dataFleiss’ kappa Krippendorf’s alpha: measures agreement of multiple raters, any type of dataKrippendorf’s alpha –Observed vs. expected disagreement

Inter-rater agreement Nine listeners, all of whom were familiar with the speakers provided ratings for at least 45 utterances but only 8 ratings per utterance were used.

Inter-rater Agreement for Meetings Agreement for high-level distinction between involved and non-involved yielded a kappa of.59 (p <.01) -- reasonable When computed over all four categories, reduced to.48 (p <.01) –More difficulty making distinctions among types of involvement (amused, disagreeing and other)

Pair-wise agreement

Native vs. nonnative raters

Acoustic cues to involvement Why prosody? –Not enough data in the corpus to allow robust language modeling. –Prosody does not require ASR results, which might not be available for certain audio browsing applications or have poor performance on meeting data

Potential Acoustic Cues to Involvement Certain prosodic features, such as F0, show good correlation with certain emotions Studies have shown that acoustic features tend to be more dependent on dimensions such as activation and evaluation than on emotions Pitch related measures, energy and duration can be useful indicators of emotion

Acoustic Features Examined F0 and energy based features were computed for each word (mean, minimum and maximum considered) Utterance scores obtained by computing average over all the words) Tried absolute or normalized values

Correlations with Perceived Involvement Class assigned to each utterance determined as a weighted version of its ratings –A soft decision, accounting for the different ratings in an adequate way Difference between two classes significant for many features –Most predictive features all F0 based –Normalized features more useful than absolute features –Patterns remain similar: the most distinguishing features are roughly the same when within speaker features are analyzed Normalization removes a significant part of the variability across speakers

Conclusions Despite subjective nature of task, raters show significant agreement in distinguishing involved from non-involved utterances Native/non-native differences in Prosodic features of rated utterances indicate involvement can be characterized by deviations in F0 and energy –Possibly general effect over speakers –If true, mean, variance, and baseline normalizations may be able to remove most variability between speakers

Analysis of the occurrence of laughter in meetings - Kornel Laskowski, Susanne Burger

Analysis of the occurrence of laughter in meetings - Kornel Laskowski, Susanne Burger Questions: –What is the quantity of laughter, relative to the quantity of speech? –How does the durational distribution of episodes of laughter differ from that of episodes of speech? –How do meeting participants affect each other in their use of laughter, relative to their use of speech?

Method Analysis Framework –Bouts, calls and spurts –Laughed speech Talk spurt segmentation –Using word-level forced alignments in ICSI Dialog Act (MRDA) Corpus –300 ms threshold, based on value adopted by the NIST Rich Transcription Meeting Recognition evaluations Selection of Annotated Laughter Instances –Vocal sound and comment instances Laugh bout segmentation –Semi-automatic segmentation

Analysis Quantity of laughter –Average participant vocalizes for 14.8% of time spent in meetings –Of this time, 8.6% spent on laughing and additional 0.8% spent on laughing while talking. –Participants differ in how much time spent vocalizing and on what proportion of that is laughter –Importantly, laughing time and speaking time do not appear to be correlated across participants.

Analysis Laughter duration and separation –Duration of laugh bouts and temporal separation between bouts for a participant? –Duration and separation of “islands” of laughter, produced by merging overlapping bouts from all participants –Bout and bout “island” durations follow a lognormal distribution, while spurt and spurt “island” durations appear to be the sum of two lognormal distributions –Bout durations and bout “island” durations have an apparently identical distribution, suggesting that bouts are committed either in isolation or in synchrony, since bout “island” construction does not lead to longer phenomena. –In contrast, construction of speech “islands” does appear to affect the distribution, as expected. –Distribution of bout and bout “island” separations appears to be the sum of two lognormal distributions.

Analysis Interactive aspects(multi-participant behavior) –Laughter distribution computed over different degrees of overlap –Laughter has significantly more overlap than speech; in relative terms, ratio is 8.1% of meeting speech time versus 39.7% of meeting laughter time –Amount of time spent in which 4 or more participants are simultaneously vocalizing is 25 times higher when laugher considered –Exclusion and inclusion of “laughed speech”

Interactive aspects(continued…) Probabilities of transition between various degrees of overlap:

Conclusions Laughter accounts for ~9.5% of all vocalizing time, which varies extensively from participant to participant and appears not to be correlated with speaking time Laugh bout durations have smaller variance than talk spurt durations Laughter responsible for significant amount of vocal activity overlap in meetings, and transitioning out of laughter overlap is much less likely than out of speech overlap Authors quantified these effects in meetings, for the first time, in terms of probabilistic transition constraints on the evolution of conversations involving arbitrary numbers of participants