Presentation is loading. Please wait.

Presentation is loading. Please wait.

ITCS 6010 VUI Evaluation Paradise & SUM. PARADISE Paradigm for Dialogue System Evaluation Goal: Maximize User Satisfaction.

Similar presentations


Presentation on theme: "ITCS 6010 VUI Evaluation Paradise & SUM. PARADISE Paradigm for Dialogue System Evaluation Goal: Maximize User Satisfaction."— Presentation transcript:

1 ITCS 6010 VUI Evaluation Paradise & SUM

2 PARADISE Paradigm for Dialogue System Evaluation Goal: Maximize User Satisfaction

3 PARADISE Paradigm for Dialogue System Evaluation Performance is modeled as a weighted function of a task-based success measure and dialogue-based cost measures, where weights are computed by correlating user satisfaction with performance. Dialogue tasks are represented as Attribute Value Matrix (AVM) pairs.

4 Attribute Value Matrix (AVM) AVM represents dialogue task Information exchange between agent and user Represented as ordered pairs of attribute and possible values AVM instantiation of task Same regardless of dialogue

5 AVM (cont’d) A1: Hello this is Train Enquiry Service. DC,AC,DR,DT Please speak after the tone. Which information do you need? U1: I want to go from Torino to Milano. DC, AC A2: Do you want to go from Trento to DC, AC Milano? Yes or No? U2: No.DC, AC A3: Do you want to leave from Trento? Yes if No?DC U3: No.DC A4: Where do you want to leave from?DC U4: I want to leave from Torino.DC A5: Do you want to leave from Torino?DC Yes or No? U5: Yes.DC A6: Do you want to go to Milano?AC U6: Yes.AC A7: At which time do you want to leave?DR U7: I want to travel in the evening.DR A8: Do you want to leave between 6 and 9pm? Yes or No?DR U8: Yes.DR A9: There is a train leaving at 8:00pm.DT …. B1: Hello this is Train Enquiry Service. DC,AC,DR,DT Please speak after the tone. Which information do you need? U1: I want to travel from Torino to Milano. DC, AC B2: At which time do you want to leave DC, AC,DR from Merano to Milano? U2: No, I want to leave from Torino in the DC, DR evening. B3: Do you want to leave from Torino between 6 and 11pm?Please answer Yes DC, DR or No. U3: Yes. DC,DR B4: A train leaves at 8:00pm. DT …. attributeactual value depart-cityTorino arrival-cityMilano depart-rangeEvening depart-time8pm

6 PARADISE Paradigm for Dialogue System Evaluation Advantages PARADISE approach addresses performance and user satisfaction Disadvantages Too complex to compute. Need a large sample size up front

7 Alternative Approaches What’s important? Maximize User Satisfaction Maximize Task Success

8 User Satisfaction How do we measure user satisfaction? Questionnaires Interviews Focus Groups

9 Task Success How do we measure task success? Logging Actual Use Performance Measurement Walkthroughs Pilot Testing

10 Task Success For each dialogue and the entire conversation establish AVMs. Measure task success with respect to: Task completion time Accuracy or Errors (e.g. misinterpretations)

11

12 Conclusions PARADISE is good, but too complex! Measure user satisfaction and task success. What if user satisfaction not most relevant aspect?

13 Speech Usability Metric (SUM) Uses 3 metrics: User satisfaction Accuracy Task completion time Eliminates restriction of one factor to determine usability

14 Speech Usability Metric (SUM) SUM = X * User Satisfaction + Y * Accuracy + Z * Completion Time X + Y + Z = 1 X, Y, Z > 0 Weights determined by evaluator

15 User Satisfaction Surveys Questionnaires Interviews

16 Accuracy Misinterpretations System recognizes wrong word Out-of-vocabulary errors Words not in system grammar Wrong choice Correct word recognized, wrong path chosen

17 Task Completion Time Time to complete task Time for expert to complete task (ETCT) Maximum time to complete task (MTCT) Expected time to complete task (ExTCT)

18 Conclusion SUM determines usability of a speech application Utilizes 3 pre-defined metrics Allows for greater flexibility


Download ppt "ITCS 6010 VUI Evaluation Paradise & SUM. PARADISE Paradigm for Dialogue System Evaluation Goal: Maximize User Satisfaction."

Similar presentations


Ads by Google