Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.

Slides:



Advertisements
Similar presentations
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) Nov, 28, 2012.
Advertisements

Markov Decision Process
UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.
BackTracking Algorithms
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Rule-based control Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
CPSC 502, Lecture 11Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 11 Oct, 18, 2011.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Infinite Horizon Problems
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)
CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
Means-ends analysis Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.
CSE 830: Design and Theory of Algorithms
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
Methods of Proof Chapter 7, second half.
Analysis of Algorithms CS 477/677
Logical Agents Chapter 7 Feb 26, Knowledge and Reasoning Knowledge of action outcome enables problem solving –a reflex agent can only find way from.
Department of Computer Science Undergraduate Events More
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Logics for Data and Knowledge Representation Propositional Logic: Reasoning Originally by Alessandro Agostini and Fausto Giunchiglia Modified by Fausto.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Logical Agents Chapter 7. Knowledge bases Knowledge base (KB): set of sentences in a formal language Inference: deriving new sentences from the KB. E.g.:
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Optimization Problems
Intro to Planning Or, how to represent the planning problem in logic.
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Department of Computer Science Undergraduate Events More
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Department of Computer Science Undergraduate Events More
Overview of the theory of computation Episode 3 0 Turing machines The traditional concepts of computability, decidability and recursive enumerability.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
Logical Agents. Inference : Example 1 How many variables? 3 variables A,B,C How many models? 2 3 = 8 models.
P & NP.
Computability and Complexity
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Markov Decision Processes
Complexity 6-1 The Class P Complexity Andrei Bulatov.
Reinforcement Learning with Partially Known World Dynamics
Logical Agents Chapter 7.
EA C461 – Artificial Intelligence Problem Solving Agents
Guest Lecture by David Johnston
Logical Agents Chapter 7.
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill

Modal logic  Need to reason about  States of knowledge  Goals  These aren’t propositions about objects …  … but rather about other propositions (define-signal front-sonar … (mode (know (< front-sonar 2000)))) … (define-signal fspace (min front-sonar front-left-sonar front-right-sonar)) (define-signal advance (behavior (know fspace) (rt-vector 0 fspace)))

Modalities in GRL  In GRL, a modality is a special kind of signal procedure  The signal it returns is just a default  You can override it with a mode declaration  It’s memoized so that it always returns the same signal object when called on the same signal object ( define-signal-modality (mymode x) … compute default … ) (define-signal sig expr (mode (mymode expr )))

Simplified modality definitions (define-signal-modality (know x) (define inputs (signal-inputs x)) (signal-expression (apply and (know inputs)))) (define-signal-modality (goal x) (define the-mode (signal-expression (accumulate or))) (define (forward-goal y) (drive-signal! x y)) (for-each forward-goal (signal-inputs x)) the-mode)

GRL modal logic API  (know x) Whether x’s value is known  (goal x) True if x is a goal of achievement Robot “wants” to make it true and move on  (maintain-goal x) True if x is a maintenance goal Robot “wants” to make it true and keep it true  (know-goal x) True if x is a knowledge goal Robot “wants” to determine the value of x

Built-in inference axioms (know (operator arg …))  (and (know arg) …) (goal (know x))  (know-goal x) (goal (maintain x))  (maintain-goal x) (know (know x))  true (know (goal x))  true

Goal reduction API  (define-signal s (and a b c …)) (define-reduction s parallel)  When s is a goal, all its inputs are goals  This is what was shown three slides ago  (define-signal s (and a b c …)) (define-reduction s serial)  When s is a goal, a is a goal  When s is a goal and a is true, b is a goal  When s is a goal and both a and b are goals, c is a goal

Useful functions  (know-that x) True if (know x) and x  (satisfied-goal x) True if x is a goal and is true  (unsatisfied-goal x) True if x is a goal and is false  (parallel-and a b c …) And gate with parallel goal reduction  (serial-and a b c …) And gate with parallel goal reduction

Planning  Given  Goal (desired state of the environment)  Current state of the environment  Set of actions  Descriptions of how actions change the state of the environment  Actions are essentially functions from states to states  Find a series of actions (called a plan) that will result in the desired goal state

A bad planning algorithm  Key idea: simulate every possible series of actions until your simulation finds the goal Plan(s, g) { for each action a { let s’ = a(s) the state after running a if s == g return s else try { return a+plan(s’,g) } catch backtrack {}; // Try another action } throw backtrack; }

Complexity  Have to search a tree of plans  If there are n possible actions, there are n m possible m-step plans  Naïve algorithm is exponential  Cleaver optimizations possible, but it’s still basically an exponential problem

Generalizations  Conditional planning  Allow ifs inside of the plan to handle contingencies  More robust  More expensive to plan  Automatic programming  Plans can be arbitrary programs  Fully undecidable

Generalizations (2)  Markov Decision Problems (MDPs)  Actions aren’t deterministic  Only know a probability distribution on the possible result states for each action  Actions are now functions from probability distributions to probability distributions  Plan can’t be a program anymore (how do you know what the output state is?)  Payoff function that tells you how good a state is  Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff  Really really expensive

Generalizations (3)  Partially Observable MDPs (POMDPs)  Actions aren’t deterministic  Don’t know what state you’re in  Sensors only give us a probability distribution on states Not states  Policy has to map probability distributions (called “belief states”) to actions Not states to actions  Payoff function that tells you how good a state is  Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff  Really really really expensive

Generalizations (4)  Can you detect a pattern here?  How to get tenure  Find a complicated instance of a problem that current technology can’t handle  Devise an elegant yet prohibitively expensive technology to solve it  Write a paper that starts with “To survive in complex dynamic worlds, an agent must …”  Add a description of your technique  Prove a lot of theorems about how your technique will solve all instances of the problem given more CPU time than the lifetime of the universe  Write: “Future work: make it fast”