Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Comparing One Sample to its Population
Chapter 4 – Reliability Observed Scores and True Scores Error
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Ability to attract and retain followers by virtue of personal characteristics - not traditional or political office (Weber ‘47) What makes an individual.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Lecture 9: One Way ANOVA Between Subjects
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
One-way Between Groups Analysis of Variance
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
The relationship between objective properties of speech and perceived pronunciation quality in read and spontaneous speech was examined. Read and spontaneous.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Using Measurement Scales to Build Marketing Effectiveness CHAPTER ten.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction EuroSpeech 1997 Authors: Y. Kim, H. Franco, L. Neumeyer Presenter:
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Getting the most out of interactive and developmental data Daniel Messinger
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Methods of Presenting and Interpreting Information Class 9.
Chapter 8 Introducing Inferential Statistics.
Effects of Reading on Word Learning
Comparing Three or More Means
Automatic Fluency Assessment
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
Gerald Dyer, Jr., MPH October 20, 2016
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
Experimental Design: The Basic Building Blocks
Statistics Definitions
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Chapter Ten: Designing, Conducting, Analyzing, and Interpreting Experiments with Two Groups The Psychologist as Detective, 4e by Smith/Davis.
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Emotion in Meetings: Hot Spots and Laughter

Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours of recording time – Each meeting contains between 3 and 9 participants – Drawn from a pool of 53 unique speakers (13 female, 40 male). – Speakers were recorded by both far field and individual close-talking microphones – The recordings from the close-talking microphones were used

Analysis of the occurrence of laughter in meetings - Kornel Laskowski, Susanne Burger

Questions asked What is the quantity of laughter, relative to the quantity of speech? How does the durational distribution of episodes of laughter differ from that of episodes of speech? How do meeting participants affect each other in their use of laughter, relative to their use of speech?

Question? What could be got out of answering these questions?

Method Analysis Framework – Bouts, calls and spurts – Laughed speech Data Preprocessing – Talk spurt segmentation Using the word-level forced alignments in the ICSI Dialog Act (MRDA) Corpus 300 ms threshold, based on a value adopted by the NIST Rich Transcription Meeting Recognition evaluations – Selection of Annotated Laughter Instances Vocal sound and comment instances – Laugh bout segmentation Semi-automatic segmentation

Analysis Quantity of laughter – The average participant vocalizes for 14.8% of the time that they spend in meetings. – Of this effort, 8.6% is spent on laughing and an additional 0.8% is spent on laughing while talking. – Participants differ in both how much time they spend vocalizing, and what proportion of that is laughter. – Importantly, laughing time and speaking time do not appear to be correlated across participants.

Question? What is laughed speech? Examples?

Analysis Laughter duration and separation – Duration of laugh bouts and the temporal separation between bouts for a participant? – Duration and separation of “islands” of laughter, produced by merging overlapping bouts from all participants – The bout and bout “island” durations follow a lognormal distribution, while spurt and spurt “island” durations appear to be the sum of two lognormal distributions – Bout durations and bout “island” durations have an apparently identical distribution, suggesting that bouts are committed either in isolation or in synchrony, since bout “island” construction does not lead to longer phenomena. – In contrast, construction of speech “islands” does appear to affect the distribution, as expected. – The distribution of bout and bout “island” separations appears to be the sum of two lognormal distributions.

Analysis Interactive aspects(multi-participant behavior) – The laughter distribution was computed over different degrees of overlap. – Laughter has significantly more overlap than speech; in relative terms, the ratio is 8.1% of meeting speech time versus 39.7% of meeting laughter time. – The amount of time spent in which 4 or more participants are simultaneously vocalizing is 25 times higher when laugher is considered. – Exclusion and inclusion of “laughed speech”

Question? Anything odd with the results?

Interactive aspects(continued…) Probabilities of transition between various degrees of overlap:

Conclusions Laughter accounts for approximately 9.5% of all vocalizing time, which varies extensively from participant to participant and appears not to be correlated with speaking time. Laugh bout durations have a smaller variance than talk spurt durations. Laughter is responsible for a significant amount of vocal activity overlap in meetings, and transitioning out of laughter overlap is much less likely than out of speech overlap. The authors have quantified these effects in meetings, for the first time, in terms of probabilistic transition constraints on the evolution of conversations involving arbitrary numbers of participants.

Have the questions been answered?

Question? Enhancements to this work?

Spotting “Hot Spots” in Meetings: Human Judgments and Prosodic Cues - Britta Werde, Elizabeth Shriberg

Questions asked Can human listeners agree on utterance-level judgments of speaker involvement? Do judgments of involvement correlate with automatically extractable prosodic cues?

Question? What could be the potential uses of such a study?

Method A subset of 13 meetings were selected and analyzed with respect to involvement. Utterances and hotspots amused, disagreeing, other and not particularly involved. Acoustics vs context… Example rating… The raters were asked to base their judgment as much as possible on the acoustics

Question? How many utterances per hotspot, possible correlations?

Inter-rater agreement In order to assess how consistently listeners perceive involvement, inter-rater agreement was measured by Kappa for both pair-wise comparisons of raters and overall agreement. Kappa computes agreement after taking chance agreement into account. Nine listeners, all of whom were familiar with the speakers provided ratings for at least 45 utterances but only 8 ratings per utterance were used.

Inter-rater agreement Inter-rater agreement for the high-level distinction between involved and non involved yielded a Kappa of.59 (p <.01) a value considered quite reasonable for subjective categorical tasks. When Kappa was computed over all four categories, it was reduced to.48 (p <.01) indicating that there is more difficulty in making distinctions among the types of involvement (amused, disagreeing and other) than in making the high-level judgment of the presence of involvement.

Question? The authors raise the question whether fine tuning the classes will help improve the kappa coefficient, do you think this would help?

Pair-wise agreement

Native vs. nonnative raters

Question? Would it be a reasonable assumption to assume that non-native rater agreement would be high? Could context have played a hidden part in this disparity?

Acoustic cues to involvement Why prosody? – There is not enough data in the corpus to allow robust language modeling. – Prosody does not require the results of an automatic speech recognizer, which might not be available for certain audio browsing applications or have a poor performance on the meeting data.

Acoustic cues to involvement Certain prosodic features, such as F0, show good correlation with certain emotions Studies have shown that acoustic features tend to be more dependent on dimensions such as activation and evaluation than on emotions Pitch related measures, energy and duration can be useful indicators of emotion.

Acoustic features F0 and energy based features were computed For each word either the average, minimum or maximum was considered. In order to obtain a single value for the utterance, the average over all the words was computed Either absolute or normalized values were used.

Correlations with perceived involvement The class assigned to each utterance was determined as a weighted version of the ratings. (A soft decision, accounting for the different ratings in an adequate way) The difference between the two classes are significant for many features. The most affected features are all F0 based Normalized features lead to greater distinction than absolute features Patterns remain similar, and the most distinguishing features are roughly the same when within speaker features are analyzed Normalization removes a significant part of the variability across speakers

Question? How could the weighted ratings have been used in the comparison of features?

Conclusions Despite the subjective nature of the task, raters show significant agreement in distinguishing involved from non-involved utterances. Differences in performance between native and nonnative raters indicate that judgments on involvement are also influenced by the native language of the listener. The prosodic features of the rated utterances indicate that involvement can be characterized by deviations in F0 and energy. It is likely that this is a general effect over all speakers as it was shown for a least one speaker that the most affected features of an individual speaker were similar to the most affected features that were computed over all speakers. If this holds true for all speakers this is an indication that the applied mean and variance as well as baseline normalizations are able to remove most of the variability between speakers.

Have the questions been answered?