Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Slides:



Advertisements
Similar presentations
Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation
Advertisements

Sentiment Analysis on Twitter Data
EE3P BEng Final Year Project – 1 st meeting SLaTE – Speech and Language Technology in Education Martin Russell
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Ability to attract and retain followers by virtue of personal characteristics - not traditional or political office (Weber ‘47) What makes an individual.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Detecting missrecognitions Predicting with prosody.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Natural Language Understanding
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
National Taiwan University, Taiwan
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.
(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.
Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.
circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,
Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.
Language Identification and Part-of-Speech Tagging
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Towards Emotion Prediction in Spoken Tutoring Dialogues
Dialogue-Learning Correlations in Spoken Dialogue Tutoring
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science Learning Research and Development Center In Proceedings of ACL 2004

Abstract Examine the utility of speech and lexical features for predicting student emotions in computer- human spoken tutoring dialogues. compare the results of machine learning experiments using these features alone or in combination to predict student emotions.

Introduction Intelligent tutoring dialogue systems have become more prevalent in recent years. Long-term goal is to merge dialogue and affective tutoring research, by enhancing our intelligent tutoring spoken dialogue system to automatically predict and adapt to student emotions, and to investigate whether this improves learning and other measures of performance.

Computer-Human Dialogue Data ITSPOKE (Intelligent Tutoring SPOKEn dialogue system), a spoken dialogue tutor built on top of the Why2-Atlas conceptual physics text-based tutoring system (VanLehn et al., 2002). –the Why2-Atlas back-end uses 3 different approaches (symbolic, statistical and hybrid) parses the student essay into propositional representations, resolves temporal and nominal anaphora. –During the dialogue, student speech is digitized from microphone input and sent to the Sphinx2 recognizer. Sphinx2's best “ transcription ” is then sent to the Why2-Atlas back-end for syntactic, semantic and dialogue analysis. Finally, the text response produced by Why2-Atlas is sent to the Cepstral text-to-speech system and played to the student. After the dialogue, the student revises the essay, thereby ending the tutoring or causing another round of tutoring/essay revision.

ITSPOKE In ITSPOKE, a student first types an essay answering a qualitative physics problem. ITSPOKE then analyzes the essay and engages the student in spoken dialogue to correct misconceptions and to elicit complete explanations.

Computer-Human Dialogue Data Corpus collected from Subjects are University of Pittsburgh students who have never taken college physics, and who are native English speakers. Subjects first read a small document of background physics material, then work through 5 problems with ITSPOKE. The corpus contains 100 dialogues from 20 subjects. 15 dialogues have been annotated for emotion. (total 333 turns)

Before research Using ripper rule induction machine learning program, which takes as input the classes to be learned: Use 25 fold cross validation text + speech –If( duration>0.65) ^ (text has “ i ” ) then neg –Else if (duration>=2.98) then neg –Else if (duration>=2.98) ^ (start time>=297.62) then pos –Else if (text has “ right ” ) then pos –Else neutral Estimated mean error and standard error is 33.03% +/- 2.45%

Before research Text only –If (text has “ i ” ) ^ (text has “ don ’ t ” ) then neg –Else if (text has “ um ” ) ^ (text has “ ” ) then neg –Else if (text has “ the ” ) ^ (text has “ ” ) then neg –Else if (text has “ right ” ) then pos –Else if (text has “ so ” ) then pos –Else if (text has “ (laugh) ” ) ^ (text has “ that ’ s ” ) then pos –Else neutral –Estimated mean error and standard error is 39.03% +/- 2.45%

Annotating Student Turns These emotions are viewed along a linear scale, shown and defined as –Negative A student turn that expresses emotions such as confused, bored, irritated. Evidence of a negative emotion can come from many knowledge sources such as lexical items (e.g. ” I don't know ” ), and/or acoustic-prosodic features (e.g., prior-turn pausing ) –Neutral a student turn not expressing a negative or positive emotion. Evidence comes from moderate loudness, pitch, and tempo. –Positive a student turn expressing emotions such as confident. Evidence comes from louder speech and faster tempo. –Mixed A student turn expressing both positive and negative emotions.

Negative, Positive, Neutral (NPN): –mixeds are conflated with neutrals. –Agreement on 2 annotators:0.4 Negative, Non-Negative (NnN): –positives, mixeds, neutrals are conflated as non- negatives. –Agreement on 2 annotators:0.5 Emotional, Non-Emotional (EnE): –negatives, positives, mixeds are conflated as Emotional; neutrals are Non-Emotional. –Agreement on 2 annotators:0.5 Annotating Student Turns

Extracting Feature from Turns Extracted the set of features ( Restricted our features to those that can be computed in real- time) Acoustic-prosodic features –4 (f0):max, min, mean, standard deviation –4 RMS : max, min, mean, standard deviation –4 temporal: amount of silence in turn, turn duration, duration of pause prior to turn ( start_time ), speaking rate Lexical features –Human-transcribed lexical items in the turn –ITSPOKE-recognized lexical items in the turn Identifier features –Subject ID, gender, problem ID

Predicting student emotions Created the 10 feature sets to study the effects that various feature combinations had on predicting emotion. –sp: 12 acoustic-prosodic features –lex: human-transcribed lexical items –asr: ITSPOKE recognized lexical items –sp+lex: combined sp and lex features –sp+asr: combined sp and asr features –+id: each above set + 3 identifier features Use Weka machine learning software ( Witten and Frank 1999) + boosted decision trees to automatically learn our emotion prediction models.

Predicting agreed turns 10 run 10-fold cross validation Baseline: (MAJ), always predict the majority class Compare statistically to the relevant baseline accuracy, as automatically in Weka using a 2-tailed t-test (p<0.5)

Predicting consensus turns In order to increase the usable data set for prediction, and to include the more difficult annotation cases.

Comparison with Human Tutoring Using the same experimental methodology for spoken human tutoring dialogues. (include same subject pool, physical problems, web and audio interface, etc) 2 annotators to annotate human tutoring data –NnN agreement:88.96%[77.8%] –EnE agreement:77.26%[66.1%] –NPN agreement:75.06%[60.7%] The table shows the results for the agreed data

Discuss Hypothesize that such differences arise in part due to differences between the 2 corpora –(1)student turns with computer tutor are much shorter than with the human tutor, thus contain less emotional content – making both annotating and predicting more difficult. –(2)student responses to the computer tutor differently and perhaps more idiosyncratically than to the human tutors –(3)the computer tutor is less flexible than the human tutor which also effects student emotional response and its expression.

Conclusion and current directions Our feature sets outperform a majority baseline, and lexical features outperform acoustic- prosodic features. While adding identifier features typically also improves performance, combining lexical and speech features do not. Also suggest that prediction in consensus- labeled turns is harder than in agreed turns. Continuing work are exploring partial automation via semi-supervised machine learning. Manual annotation might also improve reliability. Include features available in our logs (semantic analysis )