Download presentation
Presentation is loading. Please wait.
Published byDominic Payne Modified over 8 years ago
1
Alborz Geramifard Logic Programming and MDPs for Planning Winter 2009
2
MDP+ Logic + Programming MDP Index 2 Introduction Logic Programming
3
Why do we care about planning? 3
4
Sequential Decision Making 4 ✙ ?? Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard)
5
MDP+ Logic + Programming MDP Index 5 Introduction Logic Programming
6
Logic 6 Goal Directed Search StartGoal State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner) Feasible Plan: Sequence of actions from start to goal
7
Actions 7 Clean Hands Ingredients Cook Dinner Ready PreconditionActionEffect STRIPS, Graph Plan (16.410/16.413)
8
8 Graph Plan or Forward Search Not easily scalable GOLOG: ALGOL in LOGIC Restrict action space with programs Scales easier Results are dependent on high-level programs Logic Programming [Levesque, et. al. 97]
9
Situation Calculus (Temporal Logic) 9 Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) More expressive than LTL
10
10 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] ?? Aren’t we hard-coding the whole solution?
11
Blocks World 11 Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while ( ∃ b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile ?? What does it do?
12
S4S4 Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog S0S0 S1S1 S2S2S3S3 GOLOG Execution 12 Non-deterministic
13
MDP+ Logic + Programming MDP Index 13 Introduction Logic Programming
14
MDP 14 B.S., Working, +60, B.S. Studying, -50, M.S.,...
15
MDP 15 Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state?
16
MDP 16 Goal Find the optimal policy (π*), which maximizes the value function for all states
17
MDP 17 ?? Some Algorithms Value Iteration (Dynamic Programing) Policy Iteration
18
Example (Value Iteration) 18 0 0000 0000 0000 0000 0 0-2-2 -2-2-2-2-2-2-2 -2-2-2-2 0-2-3 -3-4-5-6-2-3-4 -2-3-4-5 Reward: Actions:
19
MDP+ Logic + Programming MDP Index 19 Introduction Logic Programming
20
DT-GOLOG: Decision Theoretic GOLOG 20 MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00] S4S4S4S4 S4S4S4S4 S0S0S0S0 S0S0S0S0 S1S1S1S1 S1S1S1S1 S2S2S2S2 S2S2S2S2 S3S3S3S3 S3S3S3S3
21
Decision Theoretic GOLOG 21 Programming(GOLOG) Planning(MDP) Not enough grasping Low-level Structure Known Solution High-level Structure
22
Example 22 Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it GOLOG MDP ?? Guaranteed to be optimal?
23
S4S4S0S0S1S1 S2S2S3S3 DT-GOLOG Execution 23 Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP ( ∃ b) ¬onTable(b), b ∈ {b1,..., bn} ¬onTable(b1) ∨ ¬onTable(b2) ∨... ∨ ¬onTable(bn) +3 -2 -5
24
24 First Order Dynamic Programming Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! [Sanner 07] Resulting MDP can still be intractable.
25
25 Symbolic Dynamic Programming (Deterministic) Tabular: Symbolic ? Representation of Reward and Values Adding rewards and values Max Operator Find S
26
26 Reward and Value Representation Case Representation ∃ b, b≠b1, on(b,b1) 10 ∄ b, b≠b1, on(b,b1) 0 b1
27
27 Stochastic Actions b=b10.8 b≠b10.9 P(MoveS(b) | Move(b)) = b=b10.2 b≠b10.1 P(MoveF(b) | Move(b)) = b1 Nature’s Choice
28
28 Add Symbolically Similarly defined for and [Scott Sanner - ICAPS08 Tutorial] ⊕ A10 ¬A20 B1 ¬B2 A∧BA∧B 11 A ∧ ¬B 12 ¬A ∧ B 21 ¬A ∧ ¬B 22 =
29
29 Max operator max Operator Φ110 Φ25 Φ33 Φ40 max = Φ110 ¬Φ1 ∧ Φ2 5 ¬Φ1 ∧ ¬Φ2 ∧ Φ3 3 ¬Φ1 ∧ ¬Φ2 ∧ ¬Φ3 ∧ Φ4 0 [Scott Sanner - ICAPS08 Tutorial]
30
30 Max operator max Operator Φ110 Φ25 Φ33 Φ40 max_a = Φ110 ¬Φ1 ∧ Φ2 5 ¬Φ1 ∧ ¬Φ2 ∧ Φ3 3 ¬Φ1 ∧ ¬Φ2 ∧ ¬Φ3 ∧ Φ4 0 a_1 a_2 [Scott Sanner - ICAPS08 Tutorial]
31
31 Find s? Isn’t it obvious? sss’s’ a Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it.
32
32 Find s = Goal Regression b1 B B regress( Φ1,a) Weakest relation that ensures Φ1 after taking a b1 A A Clear(b1) ∧ B≠b1 On(A,b1) Φ1=clear(b1) put(A,B) ∨ ?
33
33 Symbolic Dynamic Programming (Deterministic) Tabular: Symbolic: ?
34
34 Classical Example BoxTruckCity Goal: Have a box in Paris ∃ b, BoxIn(b,Paris) 10 else0
35
35 Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ =.9
36
36 Example [Sanner 07] ∃ b, BoxIn(b,Paris) 100.00 noop else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89. 00 unload(b,t) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80. 00 drive(t,c,paris) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72load(b,t) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 64. 7 drive(t,c2,c1) else0noop ?? What did we gain by going through all of this? sV*(s)π*(s)
37
Conclusion 37 Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration MDP+ Logic + Programming DT-GOLOG Symbolic DP
38
References 38 Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362 S. Sanner, and K. Kersting, ”Symbolic dynamic programming”. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007
39
References 39 Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).
40
40 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.