Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage)

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Dialogue Policy Optimisation

Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.

Chapter 1 What is listening?

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,

Planning under Uncertainty

Search Engines and Information Retrieval

Midterm Review Evaluation & Research Concepts Proposals & Research Design Measurement Sampling Survey methods.

Discrete-Event Simulation: A First Course Steve Park and Larry Leemis College of William and Mary.

INFO 624 Week 3 Retrieval System Evaluation

Quality is about testing early and testing often Joe Apuzzo, Ngozi Nwana, Sweety Varghese Student/Faculty Research Day CSIS Pace University May 6th, 2005.

User interface design Designing effective interfaces for software systems Objectives To suggest some general design principles for user interface design.

Lecturing with Digital Ink Richard Anderson University of Washington.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

User Interface Design Chapter 11. Objectives  Understand several fundamental user interface (UI) design principles.  Understand the process of UI design.

Stages of Second Language Acquisition

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.

1 A Practical Rollout & Tuning Strategy Phil Shinn 08/06.

Language and Technology How does technology augment, constrain and simulate person to person communication?

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

Search Engines and Information Retrieval Chapter 1.

COMPUTER ASSISTED / AIDED LANGUAGE LEARNING (CALL) By: Sugeili Liliana Chan Santos.

1 Computational Linguistics Ling 200 Spring 2006.

Speech and Language Processing Chapter 24 of SLP (part 3) Dialogue and Conversational Agents.

Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.

Learning, Adaptation and Personalization in Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC

“The Internet and the English Language by Terence Carter Charles Sauter.

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun Spoken Dialogue System Diane Litman AT&T Labs - Research Florham.

Approaching a Problem Where do we start? How do we proceed?

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.

Integrating Technology & Media Into Instruction: The ASSURE Model

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Software Architecture

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Modeling and Simulation Discrete-Event Simulation

Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.

CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.

6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.

Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.

1 CS 224S W2006 CS 224S LING 281 Speech Recognition and Synthesis Lecture 15: Dialogue and Conversational Agents (III) Dan Jurafsky.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.

CSC USI Class Meeting 10 November 9, 2010.

A Roadmap towards Machine Intelligence

Cobot in LambdaMOO: A Social Statistics Agent Michael Kearns Department Head, AI Research AT&T Labs Collaborators: J. Howe, C. Isbell, D. Kormann, D. Litman,

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

Artificial Intelligence, simulation and modelling.

Disambiguation in Natural Language Processing Eunice Chen April 20, 2005 CSE 391.

Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.

Designing and Evaluating Two Adaptive Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC

Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Artificial Intelligence for Speech Recognition

CIS 524 Possible Is Everything/tutorialrank.com

CIS 524 Education for Service/tutorialrank.com

Integrating Learning of Dialog Strategies and Semantic Parsing

Announcements Homework 3 due today (grace period through Friday)

Objective of This Course

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Final Exam Information Exam is Friday, December 13 11AM-1PM Exam will be cumulative, slightly emphasizing material since midterm Extra credit (and advantage) for problems submitted by that we use Final review on Thursday --- come with your questions Also: term papers due Thursday

The Societal Impact of Computer (and Cognitive) Science In 1985: –computers used by scientists & engineers –little attention to user interfaces or CHI –almost no “social” apps –little interaction between CS and psych. –little attention to design principles –web did not exist In 2002: –most Americans have web/Internet –popular culture and CS mixed together –Technology and society increasingly intertwined –many apps are social –“civilians” can generate rich technology –CHI a rich research area –AI and CHI Today: Case studies in spoken language interfaces and AI

Spoken Dialogue Systems Provide automated telephone access to DB Front end: ASR + TTS Back end: DB Middle: dialogue strategy is key component user ASR TTS DB spoken dialogue system

State-Based Design System state: –info attributes perceived so far –individual and average ASR confidences –other dialogue history info –data on particular user Dialogue strategy: mapping from current state to system action Typically hundreds of states, several reasonable actions from each state

Typical System Design: Sequential Search Choose state and action spaces Choose and implement a particular, “reasonable” dialogue strategy Field system, gather dialogue data (system logs, dialogue trajectories) Do simple statistical analyses Re-field improved dialogue strategy Can only examine a handful of strategies

Issues in Dialogue Strategy Design Initiative strategy Confirmation strategy DB query strategy Criteria to be optimized

Facts About ASR Inputs: audio file; grammar/language model; acoustic model Outputs: utterance matched from grammar, or no match; log-likelihood Performance precision-recall tradeoff: –“small” grammar --> high accuracy on constrained utterances, lots of no-matches –“large” grammar --> match more utterances, but with lower confidence ASR community pushing these barriers

Initiative Strategy System initiative vs. user initiative: –“Please state your departure city.” –“Please state your desired itinerary.” –“How can I help you?” Influences user expectations ASR grammar must be chosen accordingly Best choice may differ from state to state! May depend on user population & task

Confirmation Strategy High ASR confidence: accept ASR match and move on Moderate ASR confidence: confirm Low ASR confidence: re-ask How to set confidence thresholds? Early mistakes can be costly later

Markov Decision Processes System state s (in S) System action a in (in A) Transition probabilities P(s’|s,a) Reward function R(s,a) (stochastic) Fast algorithms for optimal policy Our application: P(s’|s,a) models the population of users

SDSs as MDPs Initial system utterance Initial user utterance Actions have prob. outcomes estimate transition probabilities... P(next est. state | current est. state & action)...and rewards... R(current est. state, action)...from set of exploratory dialogues Violations of Markov property! Will this work? a e a e a e system logs

The RL Approach Build initial system that is deliberately exploratory wrt state and action space Use dialogue data from initial system to build a Markov decision process (MDP) Use methods of reinforcement learning to compute optimal strategy of the MDP Re-field (improved?) system given by the optimal policy

Why Reinforcement Learning? ASR output is noisy; user population leads to stochastic behavior Design choices have long-term impact; temporal credit assignment problem Many design choices can be fixed, but - Initiative strategy - Confirmation strategy Many different performance criteria

Caveats Must still choose states and actions Must be exploratory with taste Data sparsity Violations of the Markov property A formal framework and methodology, hopefully automating one important step in system design

The Application Dialogue system providing telephone access to a DB of activities in NJ Want to obtain 3 attributes: –activity type (e.g., wine tasting) –location (e.g., Lambertville) –time (e.g., morning) Failure to bind an attribute: query DB with don’t-care

The State Space current attribute (A = 1,2,3) value (V = 0,1) confidence (C = 1,2,3,4,5) tries (T = 0,1,2,3) grammar (G = 0,1) “trouble” history bit (H = 0,1) N.B. Non-state variables record attribute values; state does not condition on previous attributes! Will this work?

Sample Actions Initiative (when T = 0): –open or constrained prompt? –open or constrained grammar? –N.B. might depend on H, A,… Confirmation (when V = 1) –confirm or move on or re-ask? –N.B. might depend on C, H, A,… Only allowed “reasonable” actions Results in 42 states with (binary) choices Small state space, large policy space

The Experiment Designed 6 specific tasks, each with web survey Gathered 75 internal subjects Split into training and test, controlling for M/F, native/non- native, experienced/inexperienced 54 training subjects generated 311 dialogues Exploratory training dialogues used to build MDP Optimal strategy for objective TASK COMPLETION computed and implemented 21 test subjects performed tasks and web surveys for modified system generated 124 dialogues Did statistical analyses of performance changes

Reward Functions Objective task completion: –-1 for an incorrect attribute binding –0,1,2,3 correct attribute bindings Binary version: –1 for 3 correct bindings, else 0 Other reward measures: ASR confidence (obj), perceived completion, user satisfaction, future use, perceived understanding, user understanding, ease of use Optimized for objective task completion, but predicted improvements in some others

Main Results Objective task completion: –train mean ~ 1.722, test mean ~ –two-sample t-test p-value ~ Binary task completion: –train mean ~ 0.515, test mean ~ –two-sample t-test p-value ~ 0.05 On all dialogues: On expert dialogues 3-6: Binary task completion - train mean ~ 0.456, test mean ~ two-sample t-test p-value ~ 0.001

Results for Other Rewards ASR performance (0-3): –train ~ 2.483, test ~ 2.671, p ~ User satisfaction (“move to the middle” effect): –%good: train ~ 0.459, test ~ 0.251, p ~ 0.06 –%bad: train ~ 0.278, test ~ 0.138, p ~ 0.07 Similar significant MTM results for ease of use Many insignificant instances of MTM Objectives improve, subjectives MTM

Comparison to Human Design Fielded comparison infeasible, but exploratory dialogues provide a Monte Carlo proxy of “consistent trajectories” Test policy performance, binary completion: 0.67 (12) Policy SysNoConfirm: (11), significant win Policy SysConfirm: -0.6 (5), significant win Policy UserNoConfirm: -0.2 (15), significant win Policy Mixed: (13), significant win Insignificant: difference with W99, similar to test Even this is a potential advance...

Cobot: A Software Agent User/client of LambdaMOO, a well-known Internet text chat and programming environment Software chat agent providing “social statistics” Previous functionality: –Extensive logging of human user behavior and interaction –Provision of social statistics and comparisons –Rudimentary chat based on IR applied to large documents –Proactive social behavior via reinforcement learning This work: –Construction, fielding, and testing of a dialogue system providing spoken natural language access to Cobot and LambdaMOO via telephone, using speech recognition and text-to-speech

Sample Dialogue HFh waves to Buster. Buster bows gracefully to HFh. Buster is overwhelmed by all these paper deadlines. Buster begins to slowly tear his hair out, one strand at a time. HFh comforts Buster. HFh [to Buster]: Remember, the mighty oak was once a nut like you. Buster [to HFh]: Right, but his personal growth was assured. Thanks anyway, though. Buster feels better now. Standard verbs and emotes: directed and broadcast speech, hug, wave, bow, nod, eye, poke, zap, grin, laugh, comfort,...

CobotDS Spoken dialogue system providing spoken natural language access to Cobot via phone Interesting issues: –The “real” versus the “virtual” –Multiparty, multimodal dialogue systems –Dialogue systems for entertainment, socialization –Highly dynamic and unstructured content –Severe communication imbalance –Use of summarization, personal grammars

Calling Cobot Provided a dozen or so “friendly” LambdaMOO users with access to a toll-free CobotDS number Users call with LambdaMOO user name, numeric password; then enter main CobotDS command loop Cobot announces phone call & user in LambdaMOO From LambdaMOO to phone user: –MOO users direct arbitrary utterances or verbs to Cobot, prefixed by text “phone:” –Via TTS, Cobot passes verb or utterance directly to phone user –Virtually no noise on this channel From phone user to LambdaMOO: Cobot passes on utterances, verbs from phone user (with attribution) Mixed in with Cobot’s other behavior and activities But this channel is very noisy (due to ASR), so…

Basic Phone Commands 38 standard LambdaMOO verbs, directed or not “Say” command with multiple ASR grammars: –Smalltalk grammar: 228 useful phrases & exclamations –Social pleasantries, assertions of mood –Statements of whereabouts –Cliché grammar: 2950 English witticisms and sayings –User specific personal grammars, periodically updated/modified –User can control grammar via the “grammar” command “Listen” command: –At every dialogue turn, CobotDS will attempt to describe all activity taking place in LambdaMOO –Provides phone user richer view, allows passive participation –User has no scrollback –Pace of activity can quickly outrun TTS rate –Thus filter activity, including via social rules

Other Useful Phone Commands “Where” and “who” commands “Summarize” command –Intended for use in non-listening mode –Provides summary of last n minutes of activity –Describes which users have generated most activity –Characterizes interactions via verb usage

Sample Transcript

A (Very) Small User Study 5 LambdaMOO users generated 31 dialogues Averaged 65 turns per dialogue Some findings: –Great variation in usage styles, verbs invoked –Popularity of “listen” command, often in “radio” mode –Effectiveness of “listen” filtering –Use of grammar command and personal grammars –Evolution of personal grammars to express limitations –ASR problems!

The Media Equation: Media = Real Life [Reeves and Nass] Politeness Interpersonal distance Flattery Personality Media and Evolution Lessons for interface design Turing Test relevance?