Feng Zhiyong Tianjin University Fall 2008 Planning and Acting in the Real World
12.1 Time. Schedules. and Resources 12.2 Hierarchical Task Network Planning 12.3 Planning and Acting in Nondeterministic Domains 12.4 Conditional Planning 12.5 Execution Monitoring and Replanning 12.6 Continuous Planning 12.7 MultiAgent Planning 12.8 Summary
Planning so far does not specify how long an action takes, when an action occurs, except to say that is before or after another action ◦ For example, in the cargo delivery domain, we might like to know when the plane carrying some cargo will arrive, not just that it will arrive when it is done flying. Job shop scheduling – time is essential critical path method (CPM) ◦ A partial order plan (with durations) ◦ Critical path (or the weakest link) ◦ Slack = LS (latest start) – ES (earliest start) ◦ Schedule = plan + time (durations for actions)
we augment our representation to include a field of the form RESOURCE: R(k), which means that k units of resource R are required by the action. is both a prerequisite-the action cannot be performed if the resource is unavailable When certain parts are not available, waiting time should be minimized This make scheduling problems more complicated by introducing additional interactions among actions. resource constraints
cannot simultaneously add engine E l to car C 1 and engine E 2 to car C 2
hierarchical decomposition ◦ dealing with complexity ◦ hierarchical methods can result in linear-time instead of exponential-time planning algorithms. hierarchical task networks(HTNs) ◦ Plans are refined by applying action decompositions ◦ until only primitive actions remain in the plan. ◦ partial-order planning →pure HTN planning.
First, the action a' is removed from P. The next step is to hook up the ordering constraints for a' in the original plan to the steps in d'. The final step is to hook up causal links. this approach might be too strict! Therefore, the best solution is for each ordering constraint to record the reason for the constraint
Pure HTN planning is undecidable, even though the underlying state space is finite! ◦ We can rule out recursion, which very few domains require. In that case, all HTN plans are of finite length and can be enumerated. ◦ We can bound the length of solutions we are about. Because the state space is finite, a plan that has more steps than there are states in the state space must include a loop that visits the same state twice. We would lose little by ruling out HTN solutions of this kind, and we would control the search. ◦ We can adopt the hybrid approach that combines POP and HTN planning. Partial-order planning by itself suffices to decide whether a plan exists, so the hybrid problem is clearly decidable
A poor couple has only two prized possessions-he a gold watch and she her beautiful long hair. Each plans to buy a present to make the other happy. He decides to trade his watch to buy a silver comb for her hair, and she decides to sell her hair to get a gold chain for his watch. In (b) the partial plan is inconsistent, because there is no way to order the "Give Comb" and "Give Chain" abstract steps without a conflict. (We assume that the "Give Comb" action has the precondition Hair, because if the wife doesn't have her long hair, the action won't have the intended effect of making her happy, and similarly for the "Give Chain" action.) In (c) we decompose the “Give Comb” step with an “installment plan” method. In the first step of the decomposition, the husband takes possession of the comb and gives it to his wife, while agreeing to deliver the watch in payment at a later date. In the second step, the watch is handed over and the obligation is fulfilled. A similar method decomposes the “Give Chain” step. As long as both giving steps are ordered before the delivery steps, this decomposition solves the problem. (Note that it relies on the problem being defined so that the happiness of using the chain with the watch or the comb with the hair persists even after the possessions are surrendered.)
Classical planning ◦ fully observable, static,and deterministic Four planning methods for handling indeterminacy ◦ Sensorless planning(described in Chapter 3) ◦ Conditional planning Also known as contingency planning ◦ Execution monitoring and replanning ◦ Continuous planning
CP in fully observable environments (FOE) initial state : the robot in the right square of a clean world; the environment is fully observable: AtR ∧CleanL∧CleanR. The goal state : the robot in the left square of a clean world. ◦ Vacuum world with actions Left, Right, and Suck ◦ Disjunctive effects: ◦ if Left sometime fails ◦ Conditional effects: Action(Suck, Precond:, Effect: (when AtL: CleanL) ^ (when AtR: CleanR) Action (Left, Precond: AtR, Effect: AtL v (AtL ^ when CleanL: !ClearnL) ◦ Conditional steps for creating conditional plans: if test then planA else planB e.g., if AtL ^ CleanL then Right else Suck ◦ The search tree for the vacuum world (Fig 12.9) State nodes (squares) and chance nodes (circles)
Triple Murphy
◦ Initial state is a state set – a belief state ◦ Determine “ both squares are clean ” with local dirt sensing the vacuum agent is AtR and knows about R, how about L? Dirt can sometimes be left behind when the agent leaves a clean square ◦ A graph representation (Figure 12.12, p438) ◦ How different between in FOE and in POE Which one is a special case of the other?
Sets of full state descriptions ◦ { (AtR ⋀ CleanR ⋀ CleanL), (AtR ⋀ CleianR ⋀ ¬CleanL) } Logical sentences that capture exactly the set of possible worlds in the belief state. ◦ AtR ⋀ CleanR Knowledge propositions describing the agent's knowledge ◦ closed-world assumption - if a knowledge proposition does not appear in the list, it is assumed false.
◦ automatic sensing ◦ active sensing: percepts are obtained only by executing specific sensory actions ◦ Action(Left, PRECOND: AtR, EFFECT: K(AtL) ⋀¬K (AtR) ⋀ when CleanR: ¬K(CleanR) ⋀ when CleanL: K ( CleanL) ⋀ When ¬ CleanL: K(¬ CleanL)). ◦ Action(CheckDirt, EFFECT: when AtL⋀CleanL: K(CleanL) ⋀ when AtL ⋀ ¬CleanL: K (¬CleanL) ⋀ when AtR ⋀ CleanR: K(CleanR) ⋀ when AtR ⋀ ¬CleanR: K(¬CleanR))
execution monitoring: checks its percepts to see whether everything is going according to plan. ◦ action monitoring ◦ plan monitoring Replanning: knows what to do when something unexpected happens: ◦ call a planner again to come up with a new plan to reach the goal
before carrying out the next action of plan, the agent examines its percepts to see whether any preconditions of the plan have unexpectedly become unsatisfied. If they have, the agent will try to get back on track by replanning a sequence of actions that should take it back to some point in the whole-plan. Return to the chair-table painting problem ◦ Plan: [Start; Open(BC); Paint(Table,Blue); Finish] What if it missed a spot of green on the table? Loop is created by plan-execute-replan, or no explicit loop Failure is only detected after an action is performed
Problem Plan: If: the agent constructs a plan to solve the painting problem by painting the chair and table red. only enough paint for the chair
Plan monitoring ◦ Detect failure by checking the preconditions for success of the entire remaining plan ◦ Useful when a goal is serendipitously achieved While you ’ re painting the chair, someone comes painting the table with the same color ◦ Cut off execution of a doomed plan and don ’ t continue until the failure actually occurs While you ’ re painting the chair, someone comes painting the table with a different color If one insists on checking every precondition, it might never get around to actually doing anything
Unpainted area will make the agent to repaint until the chair is fully painted. Is it different from the loop of repainting in conditional planning? The difference lies in the time at which the computation is done and the information is available to the computation process ◦ CP - anticipates uneven paint ◦ RP - monitors during execution
Continuous planning agent ◦ execute some steps ready to be executed ◦ refine the plan to resolve standard deficiencies ◦ refine the plan with additional information ◦ fix the plan according to unexpected changes recover from execution errors remove steps that have been made redundant Goal ->Partial Plan->Some actions-> Monitoring the world -> New Goal
Goal: On(C,D)^On(D,B) Action(Move(x,y), Pre:Clear(x)^Clear(y)^On(x,z), Eff:On(x,y)^Clear(z)^!Clear(y)^!On(x,z)) Fig – Start is used as the label for the current state
Steps in execution: ◦ Ordering - Move(D,B), then Move(C,D) ◦ Another agent did Move(D,B) - change the plan ◦ Remove the redundant step ◦ Make a mistake, so On(C,A) Still one open condition ◦ Planning one more time - Move(C,D) ◦ Final state: start -> finish
So far we have dealt with single-agent environments multiagent planning ◦ requires some form of coordination ◦ possibly achieved by communication.
A solution to a multiagent planning problem is a joint plan consisting of actions for each agent PLAN 1 : ◦ A : [Go(A,[Right,Baseline]),Hit(A, Ball)] ◦ B : [NoOp(B),NoOp(B)]. PLAN 2: ◦ A : [Go(A, [Left, Net]), NoOp(A)] ◦ B : [Go (B,[Right,baseline]),H it(23, Ball)] If A chooses plan 2 and B chooses plan 1, then nobody will return the ball. ◦ So the agents need a mechanism for coordination
concentrates on the construction of correct joint plans, deferring the coordination issue for the time being Based on partial-order planning the environment is no longer truly static ◦ Need synchronization For example, Plan 2 for the tennis problem can be represented as this sequence of joint actlons:
The simplest method: ◦ adopt a convention For example ◦ the convention "stick to your side of the court" would cause the doubles partners to select plan 2 ◦ the convention "one player always stays at the net" would lead them to plan 1 can also use communication For example, a doubles tennis player could shout "Mine!" or "Yours!" to indicate a preferred joint plan.
Not all multiagent environments involve cooperative agents Agents with conflicting utility functions are in competition with each other One example:chess-playing. So an agent must ◦ (a) recognize that there are other agents ◦ (b) compute some of the other agent's possible plans ◦ (c) compute how the other agent's plans interact with its own plans ◦ (d) decide on the best action in view of these interactions
Many actions consume resources Time is one of the most important resources Hierarchical task network (HTN) planning Standard planning algorithms assume complete and correct information and deterministic,fully observable environments. Many domains violate this assumption. Conditional plans allow the agent to sense the world during execution to decide what branch of the plan to follow.
Execution monitoring detects violations of the preconditions for successful completion of the plan. Replanning agent uses execution monitoring and splices in repairs as needed. continuous planning agent creates new goals as it goes and reacts in real time. Multiagent planning is necessary when there are other agents in the environment with which to cooperate, compete, or coordinate. Multibody planning constructs joint plans