Download presentation
Presentation is loading. Please wait.
Published byAllyson Howard Modified over 6 years ago
1
Artificial Intelligence Ch 21: Reinforcement Learning
Rodney Nielsen
2
Reinforcement Learning
Introduction Passive Reinforcement Learning Active Reinforcement Learning Generalization in Reinforcement Learning Policy Search Applications of Reinforcement Learning
3
Introduction Supervised Learning requires labels for every example (percept) What if we only know whether we were successful after a series of (state, action, percept) triples? No a priori model of the environment or reward function For example, we receive feedback/a reward (positive: “you win”, or more likely negative: “you lose”) at the end of a new game we are learning. Or maybe a reward when a point is scored
4
Example: Chess Supervised Learning: Reinforcement Learning:
Create labeled examples for each position numerous representative board states Labeled Ex.: feature vector representing state of board with label indicating what move to make Reinforcement Learning: Play a game, receive a “reward” at the end for winning or losing, and adjust all executed policy actions accordingly
5
Example: Robot Grasping
Supervised Learning: Create labeled examples for numerous representative states State: location, orientation, temperature, ability, operating characteristics, etc. of body, arm, hand, legs, head, etc., and of object Reinforcement Learning: Try to grasp object, receive a positive (negative) reward at terminal state for success (failure) and adjust all executed policy actions accordingly Or partial rewards for getting closer
6
Example: Helicopter Maneuver
Extremely difficult to program, but… Reinforcement Learning Feedback: Crashing (very negative) Shaking (moderate negative) Unstable (moderate negative) Inconsistent with goal (modest negative)
7
Example: Humanoid Robot Soccer
Goal Kicking Two State Features: x-coordinate of the ball in camera Number mm foot is shifted out from hip Three Actions: Shift leg out Shift leg in Kick Rewards: -1 per shift action -2 for missing -20 for falling +20 for scoring
8
Boston Dynamics https://www.youtube.com/watch?v=W1czBcnX1Ww
9
Passive Reinforcement Learning
π, the agent’s policy, does not change π(s) = constant action Does not know: Transition model P(s’|s,a) Reward function R(s) Percepts: Current state s Reward R(s) E.g., (1,1)-.04~(1,2)-.04~…~(4,3)+1
10
Passive Reinforcement Learning
π(s) is static No P(s’|s,a) or R(s) Percepts: s, R(s) Goal: learn the expected utility Uπ(s) . Bellman equations for a fixed policy
11
Passive Reinforcement Learning
Temporal-Difference Learning TD Equation: . α is the learning rate
12
Active Reinforcement Learning
π, the agent’s policy, must be learned Must learn complete model: Passive-ADP-Agent Transition model P(s’|s,a) Learn optimal action a ?
13
Active Reinforcement Learning
Learn optimal action a Exploration vs. exploitation Exploitation (greedy agent): Maximize reward under current policy Likely to stick roughly to the first actions that eventually led to success E.g., (1,1)-.04~(2,1)-.04~(3,1)-.04~ (3,2)-.04~(3,3)-.04~~(4,3)+1 Exploration: Test policies assumed to be suboptimal Stay in comfort zone vs. seek a better life ?
14
Active Reinforcement Learning
Learn optimal action a f(u,n): the exploration function Greed f(u) traded off against curiosity f(n) R+: optimistic estimate of best possible reward Ne: constant parameter Agent will tries each action–state pair at least Ne times ?
15
Active Reinforcement Learning
Learning an action-utility function Q-Learning Q(s,a): value of action a in state s TD agents that learn a Q-function do not need a model of P(s’|s,a) either for learning or for action selection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.