Download presentation
Presentation is loading. Please wait.
Published byMalia Gull Modified over 9 years ago
1
ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog
2
ai in game programming it university of copenhagen Rationale How can an agent learn if there is no teacher around who tells it with every action what’s right and what’s wrong? E.g., an agent can learn how to play chess by supervised learning, given that examples of states and their correct actions are provided But what if these examples are not available?
3
ai in game programming it university of copenhagen Rationale But what if these examples are not available? Through random moves, i.e., exploratory behavior, agent may be able to infer knowledge about the environment it is in But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal
4
ai in game programming it university of copenhagen Rationale But what is good and what is bad? = necessary knowledge to decide what to do in order to reach its goal ‘Rewarding’ the agent when it did something good and ‘punishing’ it when it did something bad is called reinforcement Task of reinforcement learning is to use observed rewards to learn a [best] policy for the environment
5
ai in game programming it university of copenhagen Reinforcement Learning Use observed rewards to learn an [almost?] optimal policy for an environment Reward R(s) assigns to every state s a number Utility of an environment history is [as an example] the sum of the rewards received Policy describes agent’s action from any state s in order to reach the goal Optimal policy is policy with highest expected utility
6
ai in game programming it university of copenhagen Reinforcement Learning Might be considered to encompass all of AI : an agent is dropped off somewhere and it should itself figure everything out We will concentrate on simple settings and agent designs to keep things manageable E.g. fully observable environment
7
ai in game programming it university of copenhagen Typically in Games Offline / during development : episodic reinforcement learning Multiple training instances / several runs from start to end Online / during actual game playing : incremental reinforcement learning One continuous sequence of states / possibly without clear ‘end’
8
ai in game programming it university of copenhagen 3 Agent Designs Utility-based agents : learns a utility function based on which it chooses actions Q-learning agent : learns an action value function given the expected utility of taking a given action in a given state Reflex agent : learns a policy that maps directly from states to actions
9
ai in game programming it university of copenhagen Passive Reinforcement Policy is fixed : state s always leads to the same action Goal is simply to learn how good this policy is [Of course this can be extended ‘easily’ to policy learning...]
10
ai in game programming it university of copenhagen Direct Utility Estimation Idea : utility of a state is expected total reward from that state onward Each trial provides a sample of this value for each state visited After trial, utility for each observed state is simply updated using running average In the limit, sample average converges to true expectation Direct utility estimation = standard supervised inductive learning
11
ai in game programming it university of copenhagen More Direct Utility Estimation ‘Reduction’ of the problem to ‘standard learning’ is nice [of course] However, important source of information is not used : utilities of states are not independent Utility of each state is own reward + expected utility of its successor states Bellman equations Using this prior knowledge can improve [e.g. speed up] learning considerably As is generally the case
12
ai in game programming it university of copenhagen Adaptive Dynamic Programming Take into account constraints between states Passive learning agent learns based on observed rewards and transition model Latter models the probability of reaching state s’ from state s when performing action a(s) Two possibilities Solve system of linear equations [for small systems] Update iteratively
13
ai in game programming it university of copenhagen Temporal Difference Take into account constraints between states Idea : use observed transitions to adjust utility value of observed states so that they agree [better] with the constraints
14
ai in game programming it university of copenhagen Active Reinforcement Passive learning agent has fixed policy... Active agent must decide [learn] what action to take, i.e., it should find the optimal policy Agent should make a trade-off between exploitation and exploration
15
ai in game programming it university of copenhagen Exploitation & Exploration Exploitation : use best action [at that time] in order to come to highest reward Exploration : attempt to get to all states possible by trying all actions possible [resulting in experience from which can be learned]
16
ai in game programming it university of copenhagen Exploitation & Exploration Agent relying completely on exploitation is called greedy and often very suboptimal Trade-off between greed and curiosity of the agent is controlled by an exploration function
17
ai in game programming it university of copenhagen Learning Action-Value Temporal difference learning can also be used for active reinforcement learning Action-value function gives expected utility of taking given action in given state Q-learning is an alternative to temporal difference learning that learns an action- value function Q(a,s) instead of utilities The important difference is that former is ‘model-free’, no transition model has to be learned, nor the actual utilities
18
ai in game programming it university of copenhagen Of Course : Generalization For large state spaces exact inference of utility and/or Q-function as a table becomes unrealistic Function approximation is needed, i.e., not in a tabular form Makes it possible to represent utility functions for very large state spaces More importantly, it allows for generalization All this relates, of course, to decision trees, MAP, regression, density estimation, ML, hypotheses spaces, etc.
19
ai in game programming it university of copenhagen E.g. Inverted Pendulum MPI Magdeburg, Germany
20
ai in game programming it university of copenhagen...and Triple Inverted MPI Magdeburg, Germany
21
ai in game programming it university of copenhagen E.g. Lee05a.pdf
22
ai in game programming it university of copenhagen Finally... a Summary Reinforcement learning enables agents to become skilled in an unknown environment based only on percepts and occasional rewards 3 approaches Direct utility estimation : observations independent Adaptive dynamic programming : learns model + reward function and uses this to determine utilities or optimal policy Temporal difference : adjust utility value so they agree with the constraints
23
ai in game programming it university of copenhagen More Summary... Trade-off between exploitation and exploration is important Large state spaces call for approximate methods, giving rise to function learning, regression, etc. Reinforcement learning : one of most active areas of machine learning research, because of its potential for eliminating hand coding of control strategies...
24
ai in game programming it university of copenhagen Next Week Guest lecturer Peter Andreasen on... I don’t know yet Place : Auditorium 1 Start : ±0900 [Next next week : final lecture, including probably an hour’s lecture on NERO, some words on the course evaluation and there will be the possibility for asking questions...]
25
ai in game programming it university of copenhagen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.