Download presentation
Presentation is loading. Please wait.
Published byMervyn Kennedy Modified over 9 years ago
1
1 Endgame Logistics Final Project Presentations Tuesday, March 19, 3-5, KEC2057 Powerpoint suggested (email to me before class) Can use your own laptop if necessary (e.g. demo) 10 minutes of presentation per project Not including questions Final Project Reports Due: Friday, March 22, 12 noon
2
2 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
3
3 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions (but …) vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown vs. partial model numeric vs. discrete STRIPS Planning
4
4 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete MDP Planning
5
5 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete Reinforcement Learning
6
6 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown vs. simulator numeric vs. discrete Simulation-Based Planning
7
7 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
8
8 Numeric States In many cases states are naturally described in terms of numeric quantities Classical control theory typically studies MDPs with real-valued continuous state spaces Typically assume linear dynamical systems Quite limited for most applications we are interested in in AI (often mix of discrete and numeric) Typically we deal with this via feature encodings of the state space Simulation based methods are agnostic about whether the state is numeric or discrete
9
9 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
10
10 Partial Observability In reality we only observe percepts of the world not the actual state Partially-Observable MDPs (POMDPs) extend MDPs to handle partial observability Start with an MDP and add an observation distribution P(o | s) : probability of observation o given state s We see a sequence of observations rather than sequence of states POMDP planning is much harder than MDP planning. Scalability is poor. Can often apply RL in practice using features of observations
11
11 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
12
12 Other Sources of Change In many cases the environment changes even if no actions are select by the agent Sometimes due to exogenous events, e.g. 911 calls come in at random Sometimes due to other agents Adversarial agents try to decrease our reward Cooperative agents may be trying to increase our reward or have their own objectives Decision making in the context of other agents is studied in the area of game theory
13
13 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
14
14 Durative Actions Generally different actions have different durations Often durations are stochastic Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a) which gives the probability of ending up in state s’ in t time steps after taking action a in state s Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
15
15 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
16
16 Durative Actions Generally different actions have different durations Often durations are stochastic Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations Transition distribution changes to P(s’,t | s, a) which gives the probability of ending up in state s’ in t time steps after taking action a in state s Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.
17
17 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
18
18 Concurrent Durative Actions In many problems we need to form plans that direct the actions of a team of agents Typically requires planning over the space of concurrent activities, where the different activities can have different durations Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions Standard MDP algorithms will break There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class
19
19 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
20
20 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete
21
21 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown
22
22 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown
23
23 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown
24
24 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.