Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.

Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill

Modal logic  Need to reason about  States of knowledge  Goals  These aren’t propositions about objects …  … but rather about other propositions (define-signal front-sonar … (mode (know (< front-sonar 2000)))) … (define-signal fspace (min front-sonar front-left-sonar front-right-sonar)) (define-signal advance (behavior (know fspace) (rt-vector 0 fspace)))

Modalities in GRL  In GRL, a modality is a special kind of signal procedure  The signal it returns is just a default  You can override it with a mode declaration  It’s memoized so that it always returns the same signal object when called on the same signal object ( define-signal-modality (mymode x) … compute default … ) (define-signal sig expr (mode (mymode expr )))

Simplified modality definitions (define-signal-modality (know x) (define inputs (signal-inputs x)) (signal-expression (apply and (know inputs)))) (define-signal-modality (goal x) (define the-mode (signal-expression (accumulate or))) (define (forward-goal y) (drive-signal! x y)) (for-each forward-goal (signal-inputs x)) the-mode)

GRL modal logic API  (know x) Whether x’s value is known  (goal x) True if x is a goal of achievement Robot “wants” to make it true and move on  (maintain-goal x) True if x is a maintenance goal Robot “wants” to make it true and keep it true  (know-goal x) True if x is a knowledge goal Robot “wants” to determine the value of x

Built-in inference axioms (know (operator arg …))  (and (know arg) …) (goal (know x))  (know-goal x) (goal (maintain x))  (maintain-goal x) (know (know x))  true (know (goal x))  true

Goal reduction API  (define-signal s (and a b c …)) (define-reduction s parallel)  When s is a goal, all its inputs are goals  This is what was shown three slides ago  (define-signal s (and a b c …)) (define-reduction s serial)  When s is a goal, a is a goal  When s is a goal and a is true, b is a goal  When s is a goal and both a and b are goals, c is a goal

Useful functions  (know-that x) True if (know x) and x  (satisfied-goal x) True if x is a goal and is true  (unsatisfied-goal x) True if x is a goal and is false  (parallel-and a b c …) And gate with parallel goal reduction  (serial-and a b c …) And gate with parallel goal reduction

Planning  Given  Goal (desired state of the environment)  Current state of the environment  Set of actions  Descriptions of how actions change the state of the environment  Actions are essentially functions from states to states  Find a series of actions (called a plan) that will result in the desired goal state

A bad planning algorithm  Key idea: simulate every possible series of actions until your simulation finds the goal Plan(s, g) { for each action a { let s’ = a(s) the state after running a if s == g return s else try { return a+plan(s’,g) } catch backtrack {}; // Try another action } throw backtrack; }

Complexity  Have to search a tree of plans  If there are n possible actions, there are n m possible m-step plans  Naïve algorithm is exponential  Cleaver optimizations possible, but it’s still basically an exponential problem

Generalizations  Conditional planning  Allow ifs inside of the plan to handle contingencies  More robust  More expensive to plan  Automatic programming  Plans can be arbitrary programs  Fully undecidable

Generalizations (2)  Markov Decision Problems (MDPs)  Actions aren’t deterministic  Only know a probability distribution on the possible result states for each action  Actions are now functions from probability distributions to probability distributions  Plan can’t be a program anymore (how do you know what the output state is?)  Payoff function that tells you how good a state is  Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff  Really really expensive

Generalizations (3)  Partially Observable MDPs (POMDPs)  Actions aren’t deterministic  Don’t know what state you’re in  Sensors only give us a probability distribution on states Not states  Policy has to map probability distributions (called “belief states”) to actions Not states to actions  Payoff function that tells you how good a state is  Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff  Really really really expensive

Generalizations (4)  Can you detect a pattern here?  How to get tenure  Find a complicated instance of a problem that current technology can’t handle  Devise an elegant yet prohibitively expensive technology to solve it  Write a paper that starts with “To survive in complex dynamic worlds, an agent must …”  Add a description of your technique  Prove a lot of theorems about how your technique will solve all instances of the problem given more CPU time than the lifetime of the universe  Write: “Future work: make it fast”

Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.

Similar presentations

Presentation on theme: "Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill.

Similar presentations

Presentation on theme: "Goals, plans, and planning Northwestern University CS 395 Behavior-Based Robotics Ian Horswill."— Presentation transcript:

Similar presentations

About project

Feedback