Applications of Discourse Structure for Spoken Dialogue Systems Diane Litman Department of Computer Science & Learning Research and Development Center University of Pittsburgh (Currently Leverhulme Visiting Professor, University of Edinburgh) Joint work with Mihai Rotaru, University of Pittsburgh
Spoken Dialogue Systems Systems that interact with users via speech Advantages Naturalness Efficiency Eye and hands free Domains Information access [Raux et al., 2005; Rudnicky et al., 1999; Zue et al., 2000; Van den Bosch and Lendvai, 2005] Tutoring [Graesser et al., 2001; Litman and Silliman, 2004; Pon-Barry et al., 2006] Assistants, Troubleshooting & QA [Allen et al., 2001, 2006; Acomb et al., 2007, ROLAQUAD] Naturalness -> no training
ITSPOKE ITSPOKE (Intelligent Tutoring SPOKEn Dialogue System) [Litman and Silliman, 2004] Speech-enabled version of the Why2-Atlas computer tutor [VanLehn, Jordan, Rose et al., 2002] Domain: Qualitative physics Sample ITSPOKE problem Suppose a man is in a free-falling elevator and is holding his keys motionless right in front of his face. He then lets go. What will be the position of the keys relative to the man's face as time passes? Explain. Add image
Sample dialogue with ITSPOKE TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? .............. Dialogue Structure Before release After release Forces on the man Net force on the man
Research problem What is the utility of discourse structure for spoken dialogue systems?
Discourse structure Discourse – group of utterances Monologue Dialogue Discourse structure Grosz & Sidner theory [Grosz and Sidner, 1986] Linguistic structure Discourse segments Intentional structure Discourse segment purpose/intention Discourse segment hierarchy Attentional state Components of the theory
Intention/purpose structure Discourse segment hierarchy Discourse segments Solution walkthrough TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? .............. Two time frames: before release, after release Before release Man’s velocity ? keys’ velocity After release Recipe: Forces Net force Acceleration Velocity Add example from programming Man: Forces/acceleration Forces on the man Net force on the man …………. …………. ………….
Why discourse structure? Useful for other NLP tasks: Understand specific lexical and prosodic phenomena [Hirschberg and Nakatani, 1996; Levow, 2004; Passonneau and Litman, 1997] Anaphoric expressions [Allen et al., 2001] Natural language generation [Hovy, 1993] Predictive/generative models of posture shifts [Cassell et al., 2001] Useful for spoken dialogue systems? 4 intuitions
Intuition 1 – Conditioning Student learned? Correctness: Incorrect Correct …………… It is more important to be correct at specific “places in the dialogue”. Phenomena related to performance: not uniformly important across the dialogue have more weight at specific places in the dialogue. Discourse structure can be used to define “places in the dialogue”
Intuition 2 – Discrimination Student that learned less Student that learned more …………… …………… Different discourse structure
Intuition 3 – Interaction Dialogue phenomena Certainty: Uncertain Certain Neutral …………… Certainty is not uniformly distributed across the dialogue. Dialogue phenomena: not uniformly distributed across the dialogue more frequent at specific places in the dialogue. Discourse structure can be used to define “places in the dialogue”
Intuition 4 – Visual A graphical representation of the discourse structure Easier for users to follow the conversation Preferred / learn more The Navigation Map Add Navigation Map play here
Conditioning, Discrimination Outline System side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map Intuition 1,2 Conditioning, Discrimination Intuition 3 Interaction Intuition 4 Visual
“Places in the dialogue” …………… Requirements Domain independent Automatic Approach: Discourse structure transitions Relationship between current system turn and previous system turn – 6 labels Ingredients Discourse segment hierarchy Transition labeling Say: Note that no other discourse structure elements are needed (e.g. intention/purpose information)
Discourse segment hierarchy Automatically annotate the discourse segment hierarchy Tutoring information authored in a hierarchical plan structure [VanLehn, Jordan, Rosé et al, 2002] Problem Essay Dialogue with ITSPOKE Q1 Q2 Q3
Remediation subdialogue ESSAY SUBMISSION & ANALYSIS ITSPOKE behavior & Discourse structure annotation Similar automatic annotation possible in other dialogue managers (e.g. COLLAGEN [Rich and Sidner, 1998], RavenClaw [Bohus and Rudnicky, 2003]) Q1 Q2 Q5 Tutoring information manually created by a physics expert. Q3 Q4 Remediation subdialogue
Discourse structure transitions ESSAY SUBMISSION & ANALYSIS Discourse structure transitions Properties Domain independent Automatic “Places in the dialogue” Group turns by transition Q1 Q2 Q5 Q3 Q4
Outline System side applications User side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
Multivariate linear regression Performance analysis Understand where and why a Spoken Dialogue System fails or succeeds Performance models Performance metrics – e.g. user satisfaction Interaction parameters – e.g. number of turns, speech recognition performance PARADISE framework [Walker et al., 2000] Multivariate linear regression Interaction parameters Performance metric
Performance analysis – tutoring Tutoring domain Performance metric = student learning Interaction parameters Correctness Time on task User affect (e.g. certainty) # of hints, # of help requests Models Correlation with learning – e.g. [Chi et al., 2001] PARADISE models [Forbes-Riley and Litman, 2006; Feng et al., 2006] Previous work makes limited use of Context in which events occur Dialogue patterns PreTest PostTest Interaction parameters Correlation Learning Learning
Intuition 1 – Conditioning Student learned? Posttest – pretest Correctness: Incorrect Correct …………… It is more important to be correct at specific “places in the dialogue”. Correctness overall versus Correctness at specific places in the dialogue Correctness overall versus Correctness after discourse transitions Push
Intuition 2 – Discrimination Student that learned less Student that learned more …………… …………… Push Push Push Advance Trajectories 2 consecutive transitions Different discourse structure
Experimental setup - corpus Corpus - ITSPOKE 20 students, 5 problems per student 100 dialogues, 2334 student turns Annotations Correctness (manual) “Perfect” recognition “Perfect” understanding Discourse structure transitions (automatic)
Experimental setup - parameters Correctness parameters Counts (#) and percentages (%) for each correctness value per student (e.g. C, PC %) Comparisons Correctness overall versus Correctness after specific discourse transitions Discourse structure patterns for low learners versus Discourse structure patterns for high learners I1 - conditioning I2 - discrimination Transition – correctness parameters Counts (#) and percentages (%) for each transition–correctness value per student (e.g. PopUp–C, Push–UA %) Relative percentage (%rel) (e.g. PopUp–I %rel) Transition – transition parameters Counts (#), percentages (%) and relative percentages (% rel) for each transition–transition value per student (e.g. Push-Push)
Experimental setup Methodology Experiment 1 - conditioning Correlations between parameters and learning Partial Pearson correlation with PostTest controlling for PreTest Experiment 1 - conditioning Correctness parameters versus Transition – correctness parameters Experiment 2 - discrimination Transition – transition parameters Partial correlation is common practice in tutoring research. Say we will show significant and trend correlations.
Results – Experiment 1 (a) Correctness parameters No trend/significant correlations Correctness out of context not very informative for modeling student performance
Results – Experiment 1 (b) Transition – correctness parameters Correctness Q1 Q2 Q3 Q2.1 Q2.2 PopUp–Correct, PopUp–Incorrect Interpretation: Capture successful learning events or failed learning opportunities Generalizes across corpora ITSPOKE modification: engage in an additional remediation dialogue
Results – Experiment 1 (c) Other informative transition-correctness parameters E.g. PopUpAdv-Correct, NewTopLevel-Incorrect, Advance-Correct Intuition 1 conditioning - verified Correctness overall < Correctness after discourse transitions
Experiment 2 - discrimination Student that learned less Student that learned more …………… …………… Push Push Push Advance Trajectories length 2 Transition-transition parameters Different discourse structure
Results – Experiment 2 Transition – Transition parameters Push–Push Q1 Q2 Q3 Q2.1 Q2.2 Push–Push Interpretation: system uncovers potential major knowledge gaps More specific than Push–Incorrect Other params – Advance-Advance Q2.1.1 Q2.1.2 Transition – Transition parameters : informative Overlaps with transition-correctness but offer additional insights Intuition 2 discrimination - verified
Related work Most of work ignores discourse structure (e.g. [Möller, 2005; Walker, 2000]) DATE dialogue act annotation [Walker, 2001] Identify certain types of discourse segments Task model: get date, get time, reserve hotel, etc Compute size of each type of discourse segment Differences Domain-dependent Ignores the discourse structure hierarchy Does not condition on metrics on discourse structure information Does not use structure parameters
Conclusions – performance analysis Discourse structure useful for performance analysis [EMNLP 2006] Parameters derived from discourse structure transitions Transition – correctness (1st intuition - conditioning) Transition – transition (2nd intuition - discrimination) Informative parameters have intuitive interpretations ITSPOKE modifications Monitor for PopUp-Incorrect (failed learning opportunity) Provide additional tutoring User study Experiments with certainty – similar results Performance models (>=2 parameters) Parameters that use certainty improve the quality and generality of performance models [UMUAI 2007]
Outline System side applications User side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
Intuition 3 – Interaction Dialogue phenomena 1 Dialogue phenomena not uniformly distributed across the dialogue Dependencies between Discourse transitions Dialogue phenomena User affect - Uncertainty Speech Recognition Problems …………… System turn User turn …………………………………………………… ………………………… timeline Transition χ2 test ? Phenomena
Results Significant dependencies Intuition 3 interaction - validated Transition – Uncertainty [NAACL 2007] E.g. Increased uncertainty after Push, PopUpAdv Transition – Speech Recognition Problems (SRP) [Interspeech 2006] E.g. Increase SRP after Push, PopUp Intuition 3 interaction - validated System turn User turn …………………………………………………… ………………………… timeline Transition Transition Uncertainty SRP
Outline System side applications User side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
Intuition 4 – Visual A graphical representation of the discourse structure Easier for users to follow the conversation Preferred / learn more The Navigation Map (NM)
Issues in complex domains TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? .............. Issues Increased task complexity User has limited task knowledge Longer system turns Similar issues in other complex domain dialogue systems (troubleshooting, assistants) Mention there are shorter turns too
What to communicate over the visual channel? Why the NM? Implications for users Audio channel Information Information User System Information Visual channel [Mousavi et al., 1995] What to communicate over the visual channel?
Facilitates integration What to communicate? Current ITSPOKE interface Dialogue history Animated talking heads [Graesser et al., 2003] More important to communicate Purpose of the current topic How the topic relates to the overall discussion Digested view Set up expectations Facilitates integration Discourse Structure Discourse segment intention Discourse segment hierarchy
The Navigation Map (NM) The Navigation Map (NM) – dynamic graphical representation of: Discourse segment purpose/intention Discourse segment hierarchy Additional features Information highlight Limited horizon Correct answers Auto-collapse Parallel with geography
Manually annotate a superset of the automatic annotation TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same Manually annotate a superset of the automatic annotation Discourse segments segmentation Annotate purpose/intention Annotate hierarchy Information highlight Auto-collapse Correct answers Limited horizon Tutor turn is 1 segment in auto, 3 segm in manual
Outline System side applications User side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
User experiment Intuition: Easier for users to follow the conversation with the NM If true then: Users should prefer the version with the NM (perceived utility) Users should learn better with the NM (objective utility) User experiment - user’s perception of the NM presence Hypothesis: Users will rate the NM version better
Experimental design Within-subjects design 1 problem with the NM; 1 without the NM (noNM) Rate tutor after each problem 16 questions, 1 (Strongly Disagree) – 5 (Strongly Agree) scale Two conditions (to account for order and problem) F (First) : 1st problem NM; 2nd problem noNM S (Second) : 1st problem noNM; 2nd problem NM Read Pretest Problem 1 Problem 2 NM noNM Questionnaire Posttest NM Survey Interview S condition Experimental procedure F condition Differences due to NM
Experimental design (2) ITSPOKE dialogue history was disabled Compare Audio-Only versus Audio+Visual (NM) NM noNM
Results – subjective metrics (1) Collected corpus 28 users: 13 First condition, 15 Second condition Balanced for gender Significant difference between pretest and posttest Questionnaire analysis Repeated measure ANOVA with one between subjects factor Within-subjects factor : NM Presence (NMPres) Between-subjects factor : Condition (Cond) Post-hoc tests
Results – subjective metrics (2) NM trend/significant effects on system perception during the dialogue: Rating scale 1 - Strongly Disagree ……. 5 - Strongly Agree
Results – subjective metrics (3) NM trend/significant effects on overall system perception
Results – subjective metrics (4) 24 out of 28 preferred NM over noNM 4 liked noNM (2 per condition) Divided attention problem NM changing too fast NM survey 75-86% of users agreed (4) or strongly agreed (5) that NM helped them: Follow the dialogue Learn Concentrate Update essay Open question interview NM as a structured note taker Would NM for additional instruction after the dialogue
Results – objective metrics Preliminary analysis on objective metrics (1st problem only) More correct turns with NM Fewer speech recognition problems with NM χ2 results: fewer than expected ASR/Semantic misrecognitions with NM System correctness Speech recognition Natural language understanding Automatic but noisy Interpretation: The NM influences users’ lexical choice
Related work Segmented Interaction History [Rich and Sidner, 1998] Do not investigate utility GUI-based interaction Simpler domain (air-travel) Previous computer tutoring studies Adding goal information helps [Singley, 1990; Corbett and Knapp, 1996] GUI-based tutoring Add extra information
Conclusions - Navigation Map The Navigation Map – a graphical representation of the discourse structure [ACL 2007] Subjective metrics – positively changes users’ perception Objective metrics – good preliminary results Perceived utility reflects in objective utility? Between-subjects experiment 2 conditions: with NM, without NM Objective metrics: learning, correctness, time on task, speech recognition problems
Conclusions Applications of discourse structure for spoken dialogue systems Useful for system-side and user-side applications Performance analysis [EMNLP 2006, UMUAI 2007] Characterization of discourse phenomena [Interspeech 2006, NAACL 2007] Navigation Map [ACL 2007] Tutoring domain, ITSPOKE Easy to replicate in other complex domains/systems Transitions are domain independent, can be used in text-based systems Non-domain experts can annotate discourse structure for Navigation Map
Current Directions Current work - user experiments back in Pittsburgh Validate the performance analysis modification Objective utility of the Navigation Map Future work - recognition / generation and tagging of discourse structure Plan-based, statistical approaches Necessary for analysis of human-human corpora
Other ITSPOKE Research Affect detection and adaptation in dialogue systems Annotated ITSPOKE Corpus now available! https://learnlab.web.cmu.edu/datashop/index.jsp Reinforcement Learning and user simulation (future DSG talk) Using NLP and psycholinguistics to predict learning (future IDT talk) Cohesion, alignment/convergence, semantics More details: http://www.cs.pitt.edu/~litman/itspoke.html
Acknowledgements ITSPOKE group NLP Group @ U. Pitt Hua Ai, Kate Forbes-Riley, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetrault, Art Ward NLP Group @ U. Pitt Shimei Pan, Pamela Jordan NSF Grants 0328431 and 0428472
Thank you! QUESTIONS?
Results – Experiment 1 (c) Transition – correctness parameters (continued) Correctness Q1 Q2 Q3 Q2.1 Q2.2 NewTopLevel-Incorrect Interpretation: ITSPOKE discovers student knowledge gaps ITSPOKE modification: Activate all tutoring topics for a problem Skip a tutoring topic if the first user answer is correct