Download presentation
Presentation is loading. Please wait.
1
Definition and Complexity of Some Basic Metareasoning Problems Vincent Conitzer Tuomas Sandholm Presented by Linli He 04/22/2004
2
Background What is metareasoning? Reasoning about which deliberation actions to take. What is bounded rationality? In most real world settings, due to limited time, an agent cannot perform all potentially useful deliberation actions. As a result, it will generally be unable to act rationally. What is descriptive research for bounded rationality? Characterize how agents, in particular, humans deal with this constraint. What is normative (prescriptive) research for bounded rationality? Characterize how agents should deal with this constraint. Obviously, the approach of using metareasoning to control reasoning is impractical if the metareasoning problem itself is prohibitively complex. What is the complexity of metareasoning?
3
Allocating Anytime Algorithm Time across Problems Computation time Savings Motivating Example: Newspaper Delivery Routing Problem Performance Profiles for the Routing Problem
4
Performance-Profiles Problem Problem Definition Given a list of performance profiles (f 1, f 2, …, f m ) (where f i is a non- decreasing function of deliberation time, mapping to non-negative real numbers), a number of deliberation steps N, and a target value K. The question is whether we can distribute the deliberation steps across the problem instances to get a total performance at least K. Theorem Performance-Profiles is NP-Complete even if each performance profile is continuous an piecewise linear. Proof Knapsack can be reduced to Performance-Profiles and Performance- Profiles can also be reduced to Knapsack. Given N i, we can verify if the target value is reached in polynomial time The proof only demonstrates weak NP-completeness, as Knapsack is weakly NP-complete, perhaps pseudo-polynomial time algorithms exist This is the case under the assumption (unrealistic) of perfect predictability of the efficacy of deliberation.
5
Dynamically Allocating Evaluation Effort across Options Metareasoning example: performing tests A robot can choose between three sites for digging (it can only dig at most one site). At site A it may find gold; at site B, silver; at site C, copper. If the robot choose not to dig anywhere, it gets utility 1 for saving digging costs. If it chooses to dig, the utility of finding nothing is 0; finding gold, 5; finding silver, 3; finding copper, 2. The prior probability of there being gold at site A is 1/8, that of finding silver at site B is ½, and that of finding copper at site C is ½. Motivating Example: Robot Looking for Precious Metals Test for gold at A: if there is gold, the test will be positive with probability 14/15; if no gold, the test will be positive with probability 1/15. This takes 2 time units. Test for silver at B: if there is silver, the test will be positive with probability 1; if no silver, the test will be positive with probability 0. This takes 3 time units. Test for copper at C: if there is copper, the test will be positive with probability 1; if no copper, the test will be positive with probability 0. This takes 2 time units.
6
A 10/35/99 33/407/40 5/8 B 30 1/2 3/2 C 20 1/2 1 Tree representation of the action evaluation instance by Bayes’ rule Evaluation Results Root represents not having done a test yet Left (right) branch represents the test having turned out positive (negative) The value at each node is the expected value of digging at this site The values on the edges are the probabilities of the test turning out positive or negative 2 time units 3 time units
7
Definition of Action Evaluation Tree A root r, representing the start of the evaluation For each non-leaf node w, a cost k w for investing another step of evaluation effort at this point For each edge e between parent node p and child node c, a probability p e = p(p, c) of transitioning form p to c upon taking a step of evaluation effort at p For each leaf node l, a value u l At each point in the evaluation of a single action, the agent’s choice is whether to invest further evaluation effort, but not how to continue the evaluation
8
Action-Evaluation Problem Problem Definition Given k action evaluation trees, indexed 1 through k, (corresponding to k different actions and the transition processes of the trees are independent), and an integer N. The question is whether there exists an online evaluation control policy that tasks its first evaluation step on action 1, and gives maximal expected utility among online evaluation control policies that spend at most N units of time. Theorem Action-Evaluation is NP-hard, even when all trees have depth either 0 or 1, branching factor 2, and all leaf values are –1, 0, or 1. Proof Knapsack can be reduced to Action-Evaluation No proof that the general Action-Evaluation problem is in NP.
9
Dynamically Choosing How to Disambiguate State Motivating Example A robot has discovered it is on the edge of the floor and there is a gap in front of it. The gap can only be one of three things: a staircase (S), a hole (H), or a canyon (C). There are three courses of physical action available to the robot: attempt a decent down a staircase, attempt to jump over a hole, or simply walk away. If the gap is a staircase and the robot descends down it, this gives utility 2. If the gap is a hole and the robot jumps over it, this gives utility 1. If the robot walks away, this gives utility 0 no matter what the gap is. Attempting to jump over a staircase or canyon, or trying to descend into a hole or canyon gives utility - .
10
Tests for State Disambiguation In order to determine the nature of the gap, the robot can conduct various tests (or queries), for example: Am I inside a building? A YES answer is consistent only with S; a NO answer is consistent with S, H, C. If I drop a small item into the gap, do I hear it hit the ground? A YES answer is consistent with S, H; a NO answer is consistent with H, C. Can I walk around the gap? A YES answer is consistent with S, H; a NO answer is consistent with S, H, C. After a few queries, the set of states consistent with all the answers is the intersection of the sets consistent with all the answers is the intersection of the sets consistent with the individual answers. Once, the set has been reduced to one element, the robot knows the true state of the gap.
11
State-Disambiguation Problem A set Θ = {θ 1, θ 2, …, θ r } of possible world states; A probability function p over Θ; A utility function u: Θ 0 where u(θ i ) gives the utility of knowing for certain that the world is in state θ i at the end of the metareasoning process. (u(θ i ) = 0 if not knowing the state of the world for certain.) A query set Q, where each q Q is a list of subsets of Θ. Each such subset corresponds to an answer to the query, and indicates the states that are consistent with the answer. For each state, at least one of the answers is consistent with it: that is, for any q = {a 1, a 2, …, a m }, 1 j m a j = Θ. When a query is asked, the answer is chosen (uniformly) randomly from the answers to that query that are consistent with the world’s true state. An integer N; A target value G. The question is whether there exists a policy for asking at most N queries that gives expected utility at least G. (Letting (θ t ) be the probability of identifying the state when it is θ t, the expected utility is given by )
12
PSPACE Hardness A problem is in PSPACE if it can be solved with memory bounded by a polynomial of the problem size A problem is PSPACE-hard if every problem in PSPACE can be transformed in an equivalent instance of this problem The transformation is called reduction. That must not take too much of the memory itself (e.g. log n) PSPACE-hard problem is computationally infeasible for large problem size. Note: This slide is from Go Seminar, University of Alberta, March 2003
13
State-Disambiguation is PSPACE-Hard Theorem 1: State-Disambiguation is NP-hard, even when for each state query pair, there is only one consistent answer. Proof: Set-Cover can be reduced to State-Disambiguation. Theorem 2: State-Disambiguation is PSPACE-hard. Proof: Stochastic-SAT(SSAT) can be reduced to State-Disambiguation. Theorem 3: Every State-Disambiguation instance is equivalent to another State-Disambiguation instance with a uniform prior p, and to another with a constant utility function. Moreover, these equivalent instance can be constructed in linear time. Note: This means that even when restricting ourselves to a uniform prior over states, or to a constant utility function over the states, State- Disambiguation is PSPACE-hard.
14
Stochastic-SAT (SSAT) Given a Boolean formula in conjunctive normal form (with a set of clauses C over variables x 1, x 2, …, x n, y 1, y 2, …, y n ). We play the following game with nature: pick a value for x 1, subsequently (randomly) picks a value for y 1, where upon we pick a value for x 2, after which randomly picks a value for y 2, and so on, until all variables have a value. The question is whether there is a policy (contingency plan) for playing this game such that the probability of the formula being satisfied is at least ½. SSAT is PSPACE-complete.
15
Conclusion and Future Work The results have general applicability in which most metareasoning systems must somehow deal with one or more of these problems The results show that the metareasoning policies directly suggested by decision theory (perfect rationality) are not always feasible Conclusion Future Work Investigating the complexity of metareasoning when deliberation is costly rather than limited Developing efficient (optimal or approximately optimal) metareasoning algorithms (for special cases) Developing meta-metareasoning algorithms to control metareasoning
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.