Definition and Complexity of Some Basic Metareasoning Problems Vincent Conitzer and Tuomas Sandholm Computer Science Department Carnegie Mellon University
Bounded rationality: descriptive vs. normative Full rationality requires doing all possibly useful deliberation –Computation, information gathering actions, etc. In many settings, not all possibly useful deliberation can be done because of time/other constraints –E.g. full rationality requires solving a (too) hard SAT instance This leads to bounded rationality –Try to do as well as you can given the constraint Two strands of research on bounded rationality: –Descriptive: How do humans deal with the constraint? –Normative (prescriptive): How should the constraint be dealt with? Decision theory The normative strand is useful in the design of agents
Metareasoning Normative bounded rationality requires (optimally) deciding which deliberation actions to take This is a computational problem called metareasoning –“Reasoning about how to do the reasoning” But, of course, … The metareasoning problem itself may be hard! Widely acknowledged, but there is a lack of canonical metareasoning problems in the literature We defined some canonical metareasoning problems and studied their complexity –Any real-world metareasoning system must likely deal with one or more of these problems (in addition to others)
Three problems that we study PERFORMANCE-PROFILES –How do you distribute your deliberation time over multiple disjoint problems? ACTION-EVALUATION –How do you distribute your deliberation time over multiple options available to you? STATE-DISAMBIGUATION –What is the best plan for trying to discern the state of the world?
PERFORMANCE PROFILES
What is a performance profile? Assume that we can perfectly predict how the quality of the solution will improve with the computation time we spend on it This function is a performance profile (curve) deliberation time solution quality
The performance-profiles problem Suppose: –We have multiple problems to compute on; –Our utility is the sum of the solution qualities. For example: a newspaper company has to solve vehicle routing problems to deliver the newspaper orders for the day –One for each of the cities it delivers in –Has until 5am to improve the solutions –Wants to achieve the maximum (overall) savings over standard route How should it schedule its computation time? deliberation time savings
Complexity result Theorem. PERFORMANCE-PROFILES is NP-complete. –Even with piecewise linear performance profiles Reduction is from KNAPSACK Visual proof: deliberation time solution quality v1 c1 v2 c2 But: solvable in polynomial time with concave profiles –[Boddy and Dean, 94]
ACTION EVALUATION
An action evaluation tree A robot is considering whether to dig for gold The prior probability of there being gold is 1/8 The robot can test for gold before digging –Alternative interpretation: the robot could reason about whether there is gold Say that: P(test positive|gold) = 14/15 P(test positive|no gold) = 1/15 Then (using Bayes’ rule) P(test positive) = 7/40 P(gold|test positive) = 2/3 P(gold|test negative) = 1/99 Say the utility of finding gold is 5 Then the action evaluation tree to the right represents the predicament In general, action evaluation trees can have multiple tests/reasoning steps 7/40 33/40 5/8 10/3 5/99 test N = expected utility of digging at this point M = transition probability
The action evaluation problem: example Now suppose the robot could dig in (at most one of) multiple locations, A, B, and C A may have gold (as before) B may have silver, C may have copper (For simplicity of presentation) B and C have errorless tests Different tests may take different amounts of time –Say, tests at A and C take 2 units, test at B 3 units –5 units of time available for testing 7/40 33/40 5/8 10/3 5/99 test N = expected utility of digging here now M = transition probability 1/2 3/2 3 0 test 1/ test A B C Optimal plan: test at B first; if positive, test at A; otherwise at C –Maximizes expected utility
General definition and complexity The general ACTION-EVALUATION problem: Given a set of performance profile trees, what is the expected utility of the best deliberation plan? Theorem. ACTION-EVALUATION is NP-hard even when no tree has depth greater than 1, branching factor 2, and all leaf values -1, 0, or 1. Open question. Is the general problem even harder (for example, PSPACE-hard?)
STATE DISAMBIGUATION
State disambiguation: example A robot encounters a gap in the floor in front of it The gap can be a staircase (S), hole (H), or canyon (C) There are three actions: descend (d), jump (j), or walk away (w) u(S, d) = 2 (new floors are very interesting) u(H, j) = 1 (passing a hole is somewhat interesting) u(S, w) = u(H, w) = u(C, w) = 0 u(S, j) = u(H, d) = u(C, j) = u(C, d) = -infinity (the robot is destroyed)
State disambiguation: example (cont.) To find out what the robot is in front of, it has some tests/queries that it can run Test (1) answers: “Am I inside a building?” –“Yes” is consistent only with S –“No” is consistent with H, C, and S (staircases occur outside, too) Test (2) answers: “If I drop an item in front of me, do I hear a sound?” –“Yes” is consistent with S, H –“No” is consistent with C, H (some holes are very deep) Test (3) answers: “Can I walk around the gap?” –“Yes” is consistent with S, H –“No” is consistent with S, H, C (the robot may be prevented from walking around by a wall in any case) Test (3) Test (1) Test (2) Yes No Yes No
State disambiguation: solutions Given the number of tests the robot can do… … the robot wants to set a contingency plan for which tests to do… … to maximize expected utility (of the best action to take after testing) On the right are the optimal plans with time for one or two tests Test (1) Deadline Yes No Test (1) Deadline Yes No Test (3) No Test (2) No Yes Optimal 2- step plan Optimal 1-step plan
The general state-disambiguation problem (There are some obvious generalizations of this, all of which are at least as hard as the basic variant.) We are given: –A set of possible world states, with a prior distribution –A utility for every state (how much is it worth to know (for certain) the world is in that state) Not knowing the state for certain always gives utility 0 –A set of queries (or tests), each with a set of answers For each query, for each state, some (at least 1) answers are consistent with that state, others are not; one of the consistent answers is chosen at random –A deadline giving the maximum number of reasoning steps We are asked what the expected utility of the best plan is
Complexity results Theorem. STATE-DISAMBIGUATION is NP-hard even when there is only one consistent answer for each state, for each query. –Proof reduces from SET-COVER Theorem. STATE-DISAMBIGUATION is PSPACE-hard in general. –Proof reduces from STOCHASTIC SATISFIABILITY
Conclusions & future research We defined 3 fundamental metareasoning problems –Most real-world metareasoning systems would need to deal with one or more of these problems We showed high computational complexity in solving these problems –Metareasoning policies directly suggested by decision theory not always feasible Future research should address how to deal with this complexity in metareasoning systems –Approximation algorithms –Algorithms for easy special cases –Algorithms that usually run fast –Meta-metareasoning –…
Thank you for your attention!