Download presentation
Presentation is loading. Please wait.
1
A Hybrid Linear Programming and Relaxed Plan Heuristic for Partial Satisfaction Planning Problems J. Benton Menkes van den BrielSubbarao Kambhampati Arizona State University
2
PSP UD Partial Satisfaction Planning with Utility Dependency cost: 150 cost: 1 (at plane loc2) (in person plane) (fly plane loc2) (debark person loc2) (at plane loc1) (in person plane) (at plane loc2) (at person loc2) utility((at plane loc1) & (at person loc3)) = 10 cost: 150 (fly plane loc3) (at plane loc3) (at person loc2) utility((at plane loc3)) = 1000utility((at person loc2)) = 1000 util(S 0 ): 0 S0S0 util(S 1 ): 0 S1S1 util(S 2 ): 1000 S2S2 util(S 3 ): 1000+1000+10=2010 S3S3 sum cost: 150 sum cost: 151sum cost: 251 net benefit(S 0 ): 0-0=0 net benefit(S 1 ): 0-150=-150 net benefit(S 2 ): 1000-151=849net benefit(S 3 ): 2010-251=1759 Actions have costGoal sets have utility loc2loc1 loc3 100200 150 101 Maximize Net Benefit (utility - cost) (Do, et al., IJCAI 2007) (Smith, ICAPS 2004; van den Briel, et al., AAAI 2004)
3
Action Cost/Goal Achievement Interaction Plan Quality Heuristic search for SOFT GOALS Relaxed Planning Graph Heuristics Integer programming (IP) LP-relaxation Heuristics (Do & Kambhampati, KCBS 2004; Do, et al., IJCAI 2007) Cannot take all complex interactions into account Current encodings don’t scale well, can only be optimal to some plan step BBOP-LP
4
Use its LP relaxation for a heuristic value Build a network flow- based IP encoding Approach Perform branch and bound search No time indices Uses multi-valued variables Gives a second relaxation on the heuristic Uses the LP solution to find a relaxed plan (similar to YAHSP, Vidal 2004)
5
Building a Heuristic A network flow model on variable transitions Capture relevant transitions with multi-valued fluents prevail constraints initial states goal states cost on actionsutility on goals (no time indices) loc2loc1 loc3 100200 150 101 planeperson cost: 1 cost: 101 cost: 150 cost: 200 cost: 100 util: 1000 util: 10 util: 1000
6
Building a Heuristic Constraints of this model 2. If a fact is deleted, then it must be added to re-achieve a value. 3. If a prevail condition is required, then it must be achieved. 1. If an action executes, then all of its effects and prevail conditions must also. 4. A goal utility dependency is achieved iff its goals are achieved. planeperson cost: 1 cost: 101 cost: 150 cost: 200 cost: 100 util: 1000 util: 10 util: 1000
7
Building a Heuristic Constraints of this model 1. If an action executes, then all of its effects and prevail conditions must also. 2. If a fact is deleted, then it must be added to re-achieve a value. 3. If a prevail condition is required, then it must be achieved. 4. A goal utility dependency is achieved iff its goals are achieved. action(a) = Σ effects of a in v effect(a,v,e) + Σ prevails of a in v prevail(a,v,f) 1{if f ∈ s 0 [v]} + Σ effects that add f effect(a,v,e) = Σ effects that delete f effect(a,v,e) + endvalue(v,f) 1{if f ∈ s 0 [v]} + Σ effects that add f effect(a,v,e) ≥ prevail(a,v,f) / M goaldep(k) ≥ Σ f in dependency k endvalue(v,f) – |G k | – 1 goaldep(k) ≤ endvalue(v,f) ∀ f in dependency k Variables action(a) ∈ Z + The number of times a ∈ A is executed effect(a,v,e) ∈ Z + The number of times a transition e in state variable v is caused by action a prevail(a,v,f) ∈ Z + The number of times a prevail condition f in state variable v is required by action a endvalue(v,f) ∈ {0,1} Equal to 1 if value f is the end value in a state variable v goaldep(k) Equal to 1 if a goal dependency is achieved Parameters cost(a) the cost of executing action a ∈ A utility(v,f) the utility of achieving value f in state variable v utility(k) the utility of achieving achieving goal dependency G k
8
Variables action(a) ∈ Z + The number of times a ∈ A is executed effect(a,v,e) ∈ Z + The number of times a transition e in state variable v is caused by action a prevail(a,v,f) ∈ Z + The number of times a prevail condition f in state variable v is required by action a endvalue(v,f) ∈ {0,1} Equal to 1 if value f is the end value in a state variable v goaldep(k) Equal to 1 if a goal dependency is achieved Parameters cost(a) the cost of executing action a ∈ A utility(v,f) the utility of achieving value f in state variable v utility(k) the utility of achieving achieving goal dependency G k Objective Function MAX Σ v ∈ V,f ∈ Dv utility(v,f) endvalue(v,f) + Σ k ∈ K utility(k) goaldep(k) – Σ a ∈ A cost(a) action(a) Maximize Net Benefit 2. If a fact is deleted, then it must be added to re-achieve a value. 1{if f ∈ s 0 [v]} + Σ effects that add f effect(a,v,e) = Σ effects that delete f effect(a,v,e) + endvalue(v,f) 3. If a prevail condition is required, then it must be achieved. 1{if f ∈ s 0 [v]} + Σ effects that add f effect(a,v,e) ≥ prevail(a,v,f) / M Updated at each search node
9
Search Branch and Bound Branch and bound with time limit Greedy lookahead strategy All soft goals; all states are goal states LP-solution guided relaxed plan extraction Similar to YAHSP (Vidal, 2004) Returns the best plan (i.e., best bound) To quickly find good bounds To add informedness
10
(at plane loc1) (at plane loc3) (at plane loc2) (at plane loc1) (at plane loc3) (drop person loc2) (fly loc2 loc3) (fly loc1 loc2) (fly loc3 loc2) (fly loc1 loc2) (fly loc1 loc3) (in person plane) (at person loc2) Getting a Relaxed Plan
11
(at plane loc1) (at plane loc3) (at plane loc2) (at plane loc1) (at plane loc3) (drop person loc2) (fly loc2 loc3) (fly loc1 loc2) (fly loc3 loc2) (fly loc1 loc2) (fly loc1 loc3) (in person plane) (at person loc2)
12
Getting a Relaxed Plan (at plane loc1) (at plane loc3) (at plane loc2) (at plane loc1) (drop person loc2) (fly loc2 loc3) (fly loc1 loc2) (fly loc3 loc2) (fly loc1 loc2) (fly loc1 loc3) (in person plane) (at plane loc3) (at person loc2)
13
Getting a Relaxed Plan (at plane loc1) (at plane loc3) (at plane loc2) (at plane loc1) (drop person loc2) (fly loc2 loc3) (fly loc1 loc2) (fly loc3 loc2) (fly loc1 loc2) (fly loc1 loc3) (in person plane) (at plane loc3) (at person loc2)
14
Getting a Relaxed Plan (at plane loc1) (at plane loc3) (at plane loc2) (at plane loc1) (drop person loc2) (fly loc2 loc3) (fly loc1 loc2) (fly loc3 loc2) (fly loc1 loc2) (fly loc1 loc3) (in person plane) (at plane loc3) (at person loc2)
15
Experimental Setup Three modified IPC 3 domains: zenotravel, satellite, rovers (maximize net benefit) Compared with, an admissible cost propagation-based heuristic Ran with 600 second time limit SPUDS, uses a relaxed plan-based heuristic - action costs - goal utilities - goal utility dependencies BBOP-LP : with and without RP lookahead
16
Results roverssatellite zenotravel optimal solutions Found optimal solution in 15 of 60 problems (higher net benefit is better)
17
Results
18
Novel LP-based heuristic for partial satisfaction planning Planner that is sensitive to plan quality: BBOP-LP Future Work Improve encoding Explore other lookahead methods Summary Branch and bound with RP lookahead
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.