Download presentation
Presentation is loading. Please wait.
Published byTracey Laurel Dalton Modified over 6 years ago
1
Applications of Discourse Structure for Spoken Dialogue Systems
Diane Litman Department of Computer Science & Learning Research and Development Center University of Pittsburgh (Currently Leverhulme Visiting Professor, University of Edinburgh) Joint work with Mihai Rotaru, University of Pittsburgh
2
Spoken Dialogue Systems
Systems that interact with users via speech Advantages Naturalness Efficiency Eye and hands free Domains Information access [Raux et al., 2005; Rudnicky et al., 1999; Zue et al., 2000; Van den Bosch and Lendvai, 2005] Tutoring [Graesser et al., 2001; Litman and Silliman, 2004; Pon-Barry et al., 2006] Assistants, Troubleshooting & QA [Allen et al., 2001, 2006; Acomb et al., 2007, ROLAQUAD] Naturalness -> no training
3
ITSPOKE ITSPOKE (Intelligent Tutoring SPOKEn Dialogue System) [Litman and Silliman, 2004] Speech-enabled version of the Why2-Atlas computer tutor [VanLehn, Jordan, Rose et al., 2002] Domain: Qualitative physics Sample ITSPOKE problem Suppose a man is in a free-falling elevator and is holding his keys motionless right in front of his face. He then lets go. What will be the position of the keys relative to the man's face as time passes? Explain. Add image
4
Sample dialogue with ITSPOKE
TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? Dialogue Structure Before release After release Forces on the man Net force on the man
5
Research problem What is the utility of discourse structure for spoken dialogue systems?
6
Discourse structure Discourse – group of utterances
Monologue Dialogue Discourse structure Grosz & Sidner theory [Grosz and Sidner, 1986] Linguistic structure Discourse segments Intentional structure Discourse segment purpose/intention Discourse segment hierarchy Attentional state Components of the theory
7
Intention/purpose structure Discourse segment hierarchy
Discourse segments Solution walkthrough TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? Two time frames: before release, after release Before release Man’s velocity ? keys’ velocity After release Recipe: Forces Net force Acceleration Velocity Add example from programming Man: Forces/acceleration Forces on the man Net force on the man …………. …………. ………….
8
Why discourse structure?
Useful for other NLP tasks: Understand specific lexical and prosodic phenomena [Hirschberg and Nakatani, 1996; Levow, 2004; Passonneau and Litman, 1997] Anaphoric expressions [Allen et al., 2001] Natural language generation [Hovy, 1993] Predictive/generative models of posture shifts [Cassell et al., 2001] Useful for spoken dialogue systems? 4 intuitions
9
Intuition 1 – Conditioning
Student learned? Correctness: Incorrect Correct …………… It is more important to be correct at specific “places in the dialogue”. Phenomena related to performance: not uniformly important across the dialogue have more weight at specific places in the dialogue. Discourse structure can be used to define “places in the dialogue”
10
Intuition 2 – Discrimination
Student that learned less Student that learned more …………… …………… Different discourse structure
11
Intuition 3 – Interaction
Dialogue phenomena Certainty: Uncertain Certain Neutral …………… Certainty is not uniformly distributed across the dialogue. Dialogue phenomena: not uniformly distributed across the dialogue more frequent at specific places in the dialogue. Discourse structure can be used to define “places in the dialogue”
12
Intuition 4 – Visual A graphical representation of the discourse structure Easier for users to follow the conversation Preferred / learn more The Navigation Map Add Navigation Map play here
13
Conditioning, Discrimination
Outline System side applications Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map Intuition 1,2 Conditioning, Discrimination Intuition 3 Interaction Intuition 4 Visual
14
“Places in the dialogue”
…………… Requirements Domain independent Automatic Approach: Discourse structure transitions Relationship between current system turn and previous system turn – 6 labels Ingredients Discourse segment hierarchy Transition labeling Say: Note that no other discourse structure elements are needed (e.g. intention/purpose information)
15
Discourse segment hierarchy
Automatically annotate the discourse segment hierarchy Tutoring information authored in a hierarchical plan structure [VanLehn, Jordan, Rosé et al, 2002] Problem Essay Dialogue with ITSPOKE Q1 Q2 Q3
16
Remediation subdialogue
ESSAY SUBMISSION & ANALYSIS ITSPOKE behavior & Discourse structure annotation Similar automatic annotation possible in other dialogue managers (e.g. COLLAGEN [Rich and Sidner, 1998], RavenClaw [Bohus and Rudnicky, 2003]) Q1 Q2 Q5 Tutoring information manually created by a physics expert. Q3 Q4 Remediation subdialogue
17
Discourse structure transitions
ESSAY SUBMISSION & ANALYSIS Discourse structure transitions Properties Domain independent Automatic “Places in the dialogue” Group turns by transition Q1 Q2 Q5 Q3 Q4
18
Outline System side applications User side applications
Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
19
Multivariate linear regression
Performance analysis Understand where and why a Spoken Dialogue System fails or succeeds Performance models Performance metrics – e.g. user satisfaction Interaction parameters – e.g. number of turns, speech recognition performance PARADISE framework [Walker et al., 2000] Multivariate linear regression Interaction parameters Performance metric
20
Performance analysis – tutoring
Tutoring domain Performance metric = student learning Interaction parameters Correctness Time on task User affect (e.g. certainty) # of hints, # of help requests Models Correlation with learning – e.g. [Chi et al., 2001] PARADISE models [Forbes-Riley and Litman, 2006; Feng et al., 2006] Previous work makes limited use of Context in which events occur Dialogue patterns PreTest PostTest Interaction parameters Correlation Learning Learning
21
Intuition 1 – Conditioning
Student learned? Posttest – pretest Correctness: Incorrect Correct …………… It is more important to be correct at specific “places in the dialogue”. Correctness overall versus Correctness at specific places in the dialogue Correctness overall versus Correctness after discourse transitions Push
22
Intuition 2 – Discrimination
Student that learned less Student that learned more …………… …………… Push Push Push Advance Trajectories 2 consecutive transitions Different discourse structure
23
Experimental setup - corpus
Corpus - ITSPOKE 20 students, 5 problems per student 100 dialogues, 2334 student turns Annotations Correctness (manual) “Perfect” recognition “Perfect” understanding Discourse structure transitions (automatic)
24
Experimental setup - parameters
Correctness parameters Counts (#) and percentages (%) for each correctness value per student (e.g. C, PC %) Comparisons Correctness overall versus Correctness after specific discourse transitions Discourse structure patterns for low learners versus Discourse structure patterns for high learners I1 - conditioning I2 - discrimination Transition – correctness parameters Counts (#) and percentages (%) for each transition–correctness value per student (e.g. PopUp–C, Push–UA %) Relative percentage (%rel) (e.g. PopUp–I %rel) Transition – transition parameters Counts (#), percentages (%) and relative percentages (% rel) for each transition–transition value per student (e.g. Push-Push)
25
Experimental setup Methodology Experiment 1 - conditioning
Correlations between parameters and learning Partial Pearson correlation with PostTest controlling for PreTest Experiment 1 - conditioning Correctness parameters versus Transition – correctness parameters Experiment 2 - discrimination Transition – transition parameters Partial correlation is common practice in tutoring research. Say we will show significant and trend correlations.
26
Results – Experiment 1 (a)
Correctness parameters No trend/significant correlations Correctness out of context not very informative for modeling student performance
27
Results – Experiment 1 (b)
Transition – correctness parameters Correctness Q1 Q2 Q3 Q2.1 Q2.2 PopUp–Correct, PopUp–Incorrect Interpretation: Capture successful learning events or failed learning opportunities Generalizes across corpora ITSPOKE modification: engage in an additional remediation dialogue
28
Results – Experiment 1 (c)
Other informative transition-correctness parameters E.g. PopUpAdv-Correct, NewTopLevel-Incorrect, Advance-Correct Intuition 1 conditioning - verified Correctness overall < Correctness after discourse transitions
29
Experiment 2 - discrimination
Student that learned less Student that learned more …………… …………… Push Push Push Advance Trajectories length 2 Transition-transition parameters Different discourse structure
30
Results – Experiment 2 Transition – Transition parameters Push–Push
Q1 Q2 Q3 Q2.1 Q2.2 Push–Push Interpretation: system uncovers potential major knowledge gaps More specific than Push–Incorrect Other params – Advance-Advance Q2.1.1 Q2.1.2 Transition – Transition parameters : informative Overlaps with transition-correctness but offer additional insights Intuition 2 discrimination - verified
31
Related work Most of work ignores discourse structure (e.g. [Möller, 2005; Walker, 2000]) DATE dialogue act annotation [Walker, 2001] Identify certain types of discourse segments Task model: get date, get time, reserve hotel, etc Compute size of each type of discourse segment Differences Domain-dependent Ignores the discourse structure hierarchy Does not condition on metrics on discourse structure information Does not use structure parameters
32
Conclusions – performance analysis
Discourse structure useful for performance analysis [EMNLP 2006] Parameters derived from discourse structure transitions Transition – correctness (1st intuition - conditioning) Transition – transition (2nd intuition - discrimination) Informative parameters have intuitive interpretations ITSPOKE modifications Monitor for PopUp-Incorrect (failed learning opportunity) Provide additional tutoring User study Experiments with certainty – similar results Performance models (>=2 parameters) Parameters that use certainty improve the quality and generality of performance models [UMUAI 2007]
33
Outline System side applications User side applications
Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
34
Intuition 3 – Interaction
Dialogue phenomena 1 Dialogue phenomena not uniformly distributed across the dialogue Dependencies between Discourse transitions Dialogue phenomena User affect - Uncertainty Speech Recognition Problems …………… System turn User turn …………………………………………………… ………………………… timeline Transition χ2 test ? Phenomena
35
Results Significant dependencies Intuition 3 interaction - validated
Transition – Uncertainty [NAACL 2007] E.g. Increased uncertainty after Push, PopUpAdv Transition – Speech Recognition Problems (SRP) [Interspeech 2006] E.g. Increase SRP after Push, PopUp Intuition 3 interaction - validated System turn User turn …………………………………………………… ………………………… timeline Transition Transition Uncertainty SRP
36
Outline System side applications User side applications
Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
37
Intuition 4 – Visual A graphical representation of the discourse structure Easier for users to follow the conversation Preferred / learn more The Navigation Map (NM)
38
Issues in complex domains
TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? Issues Increased task complexity User has limited task knowledge Longer system turns Similar issues in other complex domain dialogue systems (troubleshooting, assistants) Mention there are shorter turns too
39
What to communicate over the visual channel?
Why the NM? Implications for users Audio channel Information Information User System Information Visual channel [Mousavi et al., 1995] What to communicate over the visual channel?
40
Facilitates integration
What to communicate? Current ITSPOKE interface Dialogue history Animated talking heads [Graesser et al., 2003] More important to communicate Purpose of the current topic How the topic relates to the overall discussion Digested view Set up expectations Facilitates integration Discourse Structure Discourse segment intention Discourse segment hierarchy
41
The Navigation Map (NM)
The Navigation Map (NM) – dynamic graphical representation of: Discourse segment purpose/intention Discourse segment hierarchy Additional features Information highlight Limited horizon Correct answers Auto-collapse Parallel with geography
42
Manually annotate a superset of the automatic annotation
TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same Manually annotate a superset of the automatic annotation Discourse segments segmentation Annotate purpose/intention Annotate hierarchy Information highlight Auto-collapse Correct answers Limited horizon Tutor turn is 1 segment in auto, 3 segm in manual
43
Outline System side applications User side applications
Discourse transitions – defining “places in the dialogue” Performance analysis Characterization of discourse phenomena User side applications The Navigation Map Users’ perceived utility of the Navigation Map
44
User experiment Intuition: Easier for users to follow the conversation with the NM If true then: Users should prefer the version with the NM (perceived utility) Users should learn better with the NM (objective utility) User experiment - user’s perception of the NM presence Hypothesis: Users will rate the NM version better
45
Experimental design Within-subjects design
1 problem with the NM; 1 without the NM (noNM) Rate tutor after each problem 16 questions, 1 (Strongly Disagree) – 5 (Strongly Agree) scale Two conditions (to account for order and problem) F (First) : 1st problem NM; 2nd problem noNM S (Second) : 1st problem noNM; 2nd problem NM Read Pretest Problem 1 Problem 2 NM noNM Questionnaire Posttest NM Survey Interview S condition Experimental procedure F condition Differences due to NM
46
Experimental design (2)
ITSPOKE dialogue history was disabled Compare Audio-Only versus Audio+Visual (NM) NM noNM
47
Results – subjective metrics (1)
Collected corpus 28 users: 13 First condition, 15 Second condition Balanced for gender Significant difference between pretest and posttest Questionnaire analysis Repeated measure ANOVA with one between subjects factor Within-subjects factor : NM Presence (NMPres) Between-subjects factor : Condition (Cond) Post-hoc tests
48
Results – subjective metrics (2)
NM trend/significant effects on system perception during the dialogue: Rating scale 1 - Strongly Disagree ……. 5 - Strongly Agree
49
Results – subjective metrics (3)
NM trend/significant effects on overall system perception
50
Results – subjective metrics (4)
24 out of 28 preferred NM over noNM 4 liked noNM (2 per condition) Divided attention problem NM changing too fast NM survey 75-86% of users agreed (4) or strongly agreed (5) that NM helped them: Follow the dialogue Learn Concentrate Update essay Open question interview NM as a structured note taker Would NM for additional instruction after the dialogue
51
Results – objective metrics
Preliminary analysis on objective metrics (1st problem only) More correct turns with NM Fewer speech recognition problems with NM χ2 results: fewer than expected ASR/Semantic misrecognitions with NM System correctness Speech recognition Natural language understanding Automatic but noisy Interpretation: The NM influences users’ lexical choice
52
Related work Segmented Interaction History [Rich and Sidner, 1998]
Do not investigate utility GUI-based interaction Simpler domain (air-travel) Previous computer tutoring studies Adding goal information helps [Singley, 1990; Corbett and Knapp, 1996] GUI-based tutoring Add extra information
53
Conclusions - Navigation Map
The Navigation Map – a graphical representation of the discourse structure [ACL 2007] Subjective metrics – positively changes users’ perception Objective metrics – good preliminary results Perceived utility reflects in objective utility? Between-subjects experiment 2 conditions: with NM, without NM Objective metrics: learning, correctness, time on task, speech recognition problems
54
Conclusions Applications of discourse structure for spoken dialogue systems Useful for system-side and user-side applications Performance analysis [EMNLP 2006, UMUAI 2007] Characterization of discourse phenomena [Interspeech 2006, NAACL 2007] Navigation Map [ACL 2007] Tutoring domain, ITSPOKE Easy to replicate in other complex domains/systems Transitions are domain independent, can be used in text-based systems Non-domain experts can annotate discourse structure for Navigation Map
55
Current Directions Current work - user experiments back in Pittsburgh
Validate the performance analysis modification Objective utility of the Navigation Map Future work - recognition / generation and tagging of discourse structure Plan-based, statistical approaches Necessary for analysis of human-human corpora
56
Other ITSPOKE Research
Affect detection and adaptation in dialogue systems Annotated ITSPOKE Corpus now available! Reinforcement Learning and user simulation (future DSG talk) Using NLP and psycholinguistics to predict learning (future IDT talk) Cohesion, alignment/convergence, semantics More details:
57
Acknowledgements ITSPOKE group NLP Group @ U. Pitt
Hua Ai, Kate Forbes-Riley, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetrault, Art Ward NLP U. Pitt Shimei Pan, Pamela Jordan NSF Grants and
58
Thank you! QUESTIONS?
60
Results – Experiment 1 (c)
Transition – correctness parameters (continued) Correctness Q1 Q2 Q3 Q2.1 Q2.2 NewTopLevel-Incorrect Interpretation: ITSPOKE discovers student knowledge gaps ITSPOKE modification: Activate all tutoring topics for a problem Skip a tutoring topic if the first user answer is correct
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.