A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School of EECS, Oregon State University
Outline Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Motivation Motivation Several assistant systems proposed to Several assistant systems proposed to Assist users in daily tasks Assist users in daily tasks Reduce their cognitive load Reduce their cognitive load Examples: CALO (CALO 2003), COACH (Boger et al. 2005) etc Examples: CALO (CALO 2003), COACH (Boger et al. 2005) etc Problems with previous work Problems with previous work Fine-tuned to particular application domains Fine-tuned to particular application domains Utilize specialized technologies Utilize specialized technologies Lack an overarching framework Lack an overarching framework
Interaction Model User Assistant Action set U Action set A Goal W2W2 User Action W1W1 Initial State
Interaction Model Assistant W2W2 User Action W4W4 W5W5 W3W3 Assistant Actions W1W1 Initial State User Assistant Goal : Minimize users actions
Interaction Model User Assistant Goal W6W6 W2W2 User Action W4W4 W5W5 W3W3 Assistant Actions W1W1 Initial State
Interaction Model User Assistant Action set A W6W6 W7W7 W8W8 W2W2 User Action W4W4 W5W5 W3W3 Assistant Actions W1W1 Initial State Goal : Minimize users actions
Interaction Model User Assistant Thank you W6W6 W7W7 W8W8 W9W9 Goal Achieved W2W2 User Action W4W4 W5W5 W3W3 Assistant Actions W1W1 Initial State
Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Markov Decision Process MDP – (S,A,T,R,I) MDP – (S,A,T,R,I) Policy ( ) – Mapping from S to A Policy ( ) – Mapping from S to A V( ) = E(Σ T t=1 r t ), T = length of episode V( ) = E(Σ T t=1 r t ), T = length of episode Optimal Policy ( ) = argmax (V( )) Optimal Policy ( ) = argmax (V( )) A Partially Observable Markov Decision Process (POMDP): A Partially Observable Markov Decision Process (POMDP): O is the set of observations O is the set of observations µ(o|s) is a distribution over observations o є O given current state s µ(o|s) is a distribution over observations o є O given current state s
Decision-Theoretic Model (Fern et al. 07) Assistant: History-dependent stochastic policy (a|w, O) Assistant: History-dependent stochastic policy (a|w, O) Observables: World states, Agents actions Observables: World states, Agents actions Hidden: Agents goals Hidden: Agents goals Episode begins at state w with goal g Episode begins at state w with goal g C(w, g,, ): Cost of episode C(w, g,, ): Cost of episode Objective: compute that minimizes E[C(I, G 0,, )] Objective: compute that minimizes E[C(I, G 0,, )]
Assistant POMDP Given MDP, G 0 and, the assistant POMDP is defined as: Given MDP, G 0 and, the assistant POMDP is defined as: State space is W x G State space is W x G Action set is A Action set is A Transition function T is Transition function T is T((w,g),a,(w,g)) = 0 if g != g T((w,g),a,(w,g)) = 0 if g != g = T(w,a,w) if a != noop = T(w,a,w) if a != noop = P(T(w, (w,g)) = w) = P(T(w, (w,g)) = w) if a == noop if a == noop Cost model C is Cost model C is C((w, g), a) = C(w, a) if a != noop C((w, g), a) = C(w, a) if a != noop = E[C(w, a)] where a is distributed according to = E[C(w, a)] where a is distributed according to
Assistant POMDP AtAtAtAt WtWtWtWt G StStStSt W t+1 AtAtAtAt A t+1 S t+1 A t+1
Approximate Solution Approach Goal RecognizerAction Selection Environment User UtUt AtAt OtOt P(G) Assistant WtWt Online actions selection cycle Online actions selection cycle 1) Estimate posterior goal distribution given observation 2) Action selection via myopic heuristics
Goal Estimation WtWt Current State P(G | O t ) Goal posterior given observations up to time t W t+1 UtUt P(G | O t+1 ) Updated goal posterior new observation Given Given P(G | O t ) : Goal posterior at time t P(G | O t ) : Goal posterior at time t P(U t | G, W t ) : User policy P(U t | G, W t ) : User policy O t+1 : New observation of user action and world state O t+1 : New observation of user action and world state must learn user policy
Action Selection: Assistant POMDP A t WtWt W t+1 W t+2 U G A t WtWt W t+2 Assistant MDP Assume we know the user goal G and policy Assume we know the user goal G and policy Can create a corresponding assistant MDP over assistant actions Can create a corresponding assistant MDP over assistant actions Can compute Q(A, W, G) giving value of taking assistive action A when users goal is G Can compute Q(A, W, G) giving value of taking assistive action A when users goal is G Select action that maximizes expected (myopic) value: Select action that maximizes expected (myopic) value: Q ( A, W ) = P G P ( G j O t ) Q ( A ; W ; G )
Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Folder Predictor Previous work (Bao et al. 2006): Previous work (Bao et al. 2006): No repredictions No repredictions Does not consider new folders Does not consider new folders Decision-Theoretic Model Decision-Theoretic Model Naturally handles repredictions Naturally handles repredictions Considers mixture density to obtain the distribution Considers mixture density to obtain the distribution Data set – set of requests of Open and saveAs Data set – set of requests of Open and saveAs Folder hierarchy – 226 folders Folder hierarchy – 226 folders Prior distribution initialized according to the model of Bao et al. Prior distribution initialized according to the model of Bao et al. P ( f ) = ¹ 0 P 0 ( f ) + ( 1 ¡ ¹ 0 ) P l ( f )
restricted folder set all folders considered No Reprediction With Repredictions Avg. no. of clicks per open/saveAs Current Tasktracer Full Assistant Framework
Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Incorporating Relational Hierarchies Tasks are hierarchical Tasks are hierarchical Writing a paper Writing a paper Tasks have a natural class – subclass hierarchy Tasks have a natural class – subclass hierarchy Papers to ICML or IJCAI involve similar subtasks Papers to ICML or IJCAI involve similar subtasks Tasks are chosen based on some attribute of the world Tasks are chosen based on some attribute of the world Grad students work on a paper closer to the deadline Grad students work on a paper closer to the deadline Goal: Combine these ideas to Goal: Combine these ideas to Specify prior knowledge easily Specify prior knowledge easily Accelerate learning of the parameters Accelerate learning of the parameters
Doorman Domain
L = R.Loc Gather(R)Attack(E) Collect(R)Deposit(R,S)DestroyCamp(E)KillDragon(D) Goto(L) Pickup(R) Move(X) Open(D) DropOff(R,S) R.Type = S.Type L = S.Loc L = D.Loc Kill(D) Destroy(E) L = E.Loc E.Type = D.Type
Performance of different models
Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Open Problems Partial Observability of the user Partial Observability of the user Currently user completely observes the environment Currently user completely observes the environment Not the case in real-world – User need not know what is in the refrigerator Not the case in real-world – User need not know what is in the refrigerator Assistant can completely observe the world Assistant can completely observe the world Current system does not consider users exploratory actions Current system does not consider users exploratory actions Setting is similar to interactive POMDPs (Doshi et al.) Setting is similar to interactive POMDPs (Doshi et al.) Environment – POMDP Environment – POMDP Belief states of the POMDP are belief states of the user Belief states of the POMDP are belief states of the user State space needs to be extended to capture users beliefs State space needs to be extended to capture users beliefs
Open Problems Large State space Large State space Solving POMDP is impractical Solving POMDP is impractical Kitchen Domain (Fern et al.) – states Kitchen Domain (Fern et al.) – states Prune certain regions of the search space (Electric Elves) Prune certain regions of the search space (Electric Elves) Can use user trajectories as training examples Can use user trajectories as training examples Parallel subgoals/actions Parallel subgoals/actions Assistant and user execute actions in parallel Assistant and user execute actions in parallel Useful to execute parallel subgoals - User writes paper, assistant runs experiments Useful to execute parallel subgoals - User writes paper, assistant runs experiments Identification of the possible parallel actions Identification of the possible parallel actions The assistant can change the goal stack of the user The assistant can change the goal stack of the user Goal estimation has to include the users response Goal estimation has to include the users response
Open Problems Changing goals Changing goals User can change goal midway - Work on a different project User can change goal midway - Work on a different project Currently, the system would converge to the goal slowly Currently, the system would converge to the goal slowly Explicitly model this possibility Explicitly model this possibility Borrow ideas from user modeling to predict changing goals Borrow ideas from user modeling to predict changing goals Expanding set of goals Expanding set of goals A large number of dishes can be cooked A large number of dishes can be cooked Forgetting subgoals Forgetting subgoals Forgetting to attach a document to the Forgetting to attach a document to the Explicitly model this possibility – borrow ideas from cognitive science literature Explicitly model this possibility – borrow ideas from cognitive science literature
Introduction Introduction Decision-Theoretic Model Decision-Theoretic Model Experiment with folder predictor Experiment with folder predictor Incorporating Relational Hierarchies Incorporating Relational Hierarchies Open Problems Open Problems Conclusion Conclusion
Conclusion Propose a general framework based on decision-theory Propose a general framework based on decision-theory Experiments in a real-world domain Experiments in a real-world domain Repredictions are useful Repredictions are useful Currently working on a relational hierarchical model Currently working on a relational hierarchical model Outlined several open problems Outlined several open problems Motivated the necessity of using sophisticated user models Motivated the necessity of using sophisticated user models
Thank you!!!