Alborz Geramifard Logic Programming and MDPs for Planning Winter 2009.

Slides:



Advertisements
Similar presentations
CS344 : Introduction to Artificial Intelligence
Advertisements

Planning Module THREE: Planning, Production Systems,Expert Systems, Uncertainty Dr M M Awais.
CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.
Markov Decision Process
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
Plan Generation & Causal-Link Planning 1 José Luis Ambite.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Knowledge Representation Meets Stochastic Planning Bob Givan Joint work w/ Alan Fern and SungWook Yoon Electrical and Computer Engineering Purdue University.
Planning CSE 473 Chapters 10.3 and 11. © D. Weld, D. Fox 2 Planning Given a logical description of the initial situation, a logical description of the.
1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
Markov Decision Processes
Infinite Horizon Problems
Planning under Uncertainty
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Planning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 11.
Artificial Intelligence Chapter 11: Planning
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Markov Decision Processes
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)
Strategic Decisions Using Dynamic Programming
Markov Decision Processes
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
AI Principles, Lecture on Planning Planning Jeremy Wyatt.
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Nitin Kumar Yadav RMIT University, Melbourne Minor thesis - semester 2, 2009, under the supervision of Dr. Sebastian Sardina,
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Solving Large Markov Decision Processes Yilan Gu Dept. of Computer Science University of Toronto April 12, 2004.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
INTRODUCTION TO Machine Learning
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
1 Introduction to Reinforcement Learning Freek Stulp.
A General Introduction to Artificial Intelligence.
MDPs (cont) & Reinforcement Learning
AI Lecture 17 Planning Noémie Elhadad (substituting for Prof. McKeown)
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 24 of 42 Friday, 20 October.
Sutton & Barto, Chapter 4 Dynamic Programming. Programming Assignments? Course Discussions?
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
Any Questions? Programming Assignments?. CptS 450/516 Singularity Weka / Torch / RL-Glue/Burlap/RLPy Finite MDP = ? Start arbitrarily, moving towards.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 17 of 42 Wednesday, 04 October.
Computing & Information Sciences Kansas State University Friday, 13 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 21 of 42 Friday, 13 October.
CS 541: Artificial Intelligence Lecture X: Markov Decision Process Slides Credit: Peter Norvig and Sebastian Thrun.
Markov Decision Process (MDP)
Review for the Midterm Exam
Planning José Luis Ambite.
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Planning CSE 573 A handful of GENERAL SEARCH TECHNIQUES lie at the heart of practically all work in AI We will encounter the SAME PRINCIPLES again and.
Chapter 4: Dynamic Programming
Approximate POMDP planning: Overcoming the curse of history!
Introduction to Reinforcement Learning and Q-Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 4: Dynamic Programming
Presentation transcript:

Alborz Geramifard Logic Programming and MDPs for Planning Winter 2009

MDP+ Logic + Programming MDP Index 2 Introduction Logic Programming

Why do we care about planning? 3

Sequential Decision Making 4 ✙ ?? Desired Property? Feasible: Logic (Focuses on Feasibility) Optimal: MDPs (Scaling is Hard)

MDP+ Logic + Programming MDP Index 5 Introduction Logic Programming

Logic 6 Goal Directed Search StartGoal State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner) Feasible Plan: Sequence of actions from start to goal

Actions 7 Clean Hands Ingredients Cook Dinner Ready PreconditionActionEffect STRIPS, Graph Plan (16.410/16.413)

8 Graph Plan or Forward Search Not easily scalable GOLOG: ALGOL in LOGIC Restrict action space with programs Scales easier Results are dependent on high-level programs Logic Programming [Levesque, et. al. 97]

Situation Calculus (Temporal Logic) 9 Situation: S0, do(A,S) Example: do(putdown(A), do(walk(L), do(pickup(A), S0))) Relational Fluent: relations with values depending on situations Example: is_carrying(robot, p, s) Functional Fluent: situation dependent functions Example: loc(robot, s) More expressive than LTL

10 GOLOG: Syntax [Scott Sanner - ICAPS08 Tutorial] ?? Aren’t we hard-coding the whole solution?

Blocks World 11 Actions: pickup(b), putOnTable(b), putOn(b1,b2) Fluents: onTable(b, s), on(b1, b2, s) GOLOG Program: while ( ∃ b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile ?? What does it do?

S4S4 Given Start S0, Goal S’, and program δ Call Do(δ,S0,S’) First-order logic proof system Prolog S0S0 S1S1 S2S2S3S3 GOLOG Execution 12 Non-deterministic

MDP+ Logic + Programming MDP Index 13 Introduction Logic Programming

MDP 14 B.S., Working, +60, B.S. Studying, -50, M.S.,...

MDP 15 Policy π(s): How to act in each state Value Function V(s): How good is it to be in each state?

MDP 16 Goal Find the optimal policy (π*), which maximizes the value function for all states

MDP 17 ?? Some Algorithms Value Iteration (Dynamic Programing) Policy Iteration

Example (Value Iteration) Reward: Actions:

MDP+ Logic + Programming MDP Index 19 Introduction Logic Programming

DT-GOLOG: Decision Theoretic GOLOG 20 MDP Scaling is challenging GOLOG No Optimization All non-deterministic actions are optimized using MDP concepts. Uncertainty, Reward [Boutilier, et. al 00] S4S4S4S4 S4S4S4S4 S0S0S0S0 S0S0S0S0 S1S1S1S1 S1S1S1S1 S2S2S2S2 S2S2S2S2 S3S3S3S3 S3S3S3S3

Decision Theoretic GOLOG 21 Programming(GOLOG) Planning(MDP) Not enough grasping Low-level Structure Known Solution High-level Structure

Example 22 Build a tower with least effort Use which block for base? In which order pick up the blocks Pick a block as base Stack all other blocks on top of it GOLOG MDP ?? Guaranteed to be optimal?

S4S4S0S0S1S1 S2S2S3S3 DT-GOLOG Execution 23 Given Start S0, Goal S’, and program δ Add rewards Formulate the problem as an MDP ( ∃ b) ¬onTable(b), b ∈ {b1,..., bn} ¬onTable(b1) ∨ ¬onTable(b2) ∨... ∨ ¬onTable(bn)

24 First Order Dynamic Programming Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! [Sanner 07] Resulting MDP can still be intractable.

25 Symbolic Dynamic Programming (Deterministic) Tabular: Symbolic ? Representation of Reward and Values Adding rewards and values Max Operator Find S

26 Reward and Value Representation Case Representation ∃ b, b≠b1, on(b,b1) 10 ∄ b, b≠b1, on(b,b1) 0 b1

27 Stochastic Actions b=b10.8 b≠b10.9 P(MoveS(b) | Move(b)) = b=b10.2 b≠b10.1 P(MoveF(b) | Move(b)) = b1 Nature’s Choice

28 Add Symbolically Similarly defined for and [Scott Sanner - ICAPS08 Tutorial] ⊕ A10 ¬A20 B1 ¬B2 A∧BA∧B 11 A ∧ ¬B 12 ¬A ∧ B 21 ¬A ∧ ¬B 22 =

29 Max operator max Operator Φ110 Φ25 Φ33 Φ40 max = Φ110 ¬Φ1 ∧ Φ2 5 ¬Φ1 ∧ ¬Φ2 ∧ Φ3 3 ¬Φ1 ∧ ¬Φ2 ∧ ¬Φ3 ∧ Φ4 0 [Scott Sanner - ICAPS08 Tutorial]

30 Max operator max Operator Φ110 Φ25 Φ33 Φ40 max_a = Φ110 ¬Φ1 ∧ Φ2 5 ¬Φ1 ∧ ¬Φ2 ∧ Φ3 3 ¬Φ1 ∧ ¬Φ2 ∧ ¬Φ3 ∧ Φ4 0 a_1 a_2 [Scott Sanner - ICAPS08 Tutorial]

31 Find s? Isn’t it obvious? sss’s’ a Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it.

32 Find s = Goal Regression b1 B B regress( Φ1,a) Weakest relation that ensures Φ1 after taking a b1 A A Clear(b1) ∧ B≠b1 On(A,b1) Φ1=clear(b1) put(A,B) ∨ ?

33 Symbolic Dynamic Programming (Deterministic) Tabular: Symbolic: ?

34 Classical Example BoxTruckCity Goal: Have a box in Paris ∃ b, BoxIn(b,Paris) 10 else0

35 Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ =.9

36 Example [Sanner 07] ∃ b, BoxIn(b,Paris) noop else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) unload(b,t) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) drive(t,c,paris) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72load(b,t) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) drive(t,c2,c1) else0noop ?? What did we gain by going through all of this? sV*(s)π*(s)

Conclusion 37 Logic Programming Situation Calculus GOLOG Planning MDP Review Value Iteration MDP+ Logic + Programming DT-GOLOG Symbolic DP

References 38 Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: S. Sanner, and K. Kersting, ”Symbolic dynamic programming”. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007

References 39 Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp (2001).

40 Questions ?