Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence Ch 21: Reinforcement Learning

Similar presentations


Presentation on theme: "Artificial Intelligence Ch 21: Reinforcement Learning"— Presentation transcript:

1 Artificial Intelligence Ch 21: Reinforcement Learning
Rodney Nielsen

2 Reinforcement Learning
Introduction Passive Reinforcement Learning Active Reinforcement Learning Generalization in Reinforcement Learning Policy Search Applications of Reinforcement Learning

3 Introduction Supervised Learning requires labels for every example (percept) What if we only know whether we were successful after a series of (state, action, percept) triples? No a priori model of the environment or reward function For example, we receive feedback/a reward (positive: “you win”, or more likely negative: “you lose”) at the end of a new game we are learning. Or maybe a reward when a point is scored

4 Example: Chess Supervised Learning: Reinforcement Learning:
Create labeled examples for each position numerous representative board states Labeled Ex.: feature vector representing state of board with label indicating what move to make Reinforcement Learning: Play a game, receive a “reward” at the end for winning or losing, and adjust all executed policy actions accordingly

5 Example: Robot Grasping
Supervised Learning: Create labeled examples for numerous representative states State: location, orientation, temperature, ability, operating characteristics, etc. of body, arm, hand, legs, head, etc., and of object Reinforcement Learning: Try to grasp object, receive a positive (negative) reward at terminal state for success (failure) and adjust all executed policy actions accordingly Or partial rewards for getting closer

6 Example: Helicopter Maneuver
Extremely difficult to program, but… Reinforcement Learning Feedback: Crashing (very negative) Shaking (moderate negative) Unstable (moderate negative) Inconsistent with goal (modest negative)

7 Example: Humanoid Robot Soccer
Goal Kicking Two State Features: x-coordinate of the ball in camera Number mm foot is shifted out from hip Three Actions: Shift leg out Shift leg in Kick Rewards: -1 per shift action -2 for missing -20 for falling +20 for scoring

8 Boston Dynamics https://www.youtube.com/watch?v=W1czBcnX1Ww

9 Passive Reinforcement Learning
π, the agent’s policy, does not change π(s) = constant action Does not know: Transition model P(s’|s,a) Reward function R(s) Percepts: Current state s Reward R(s) E.g., (1,1)-.04~(1,2)-.04~…~(4,3)+1

10 Passive Reinforcement Learning
π(s) is static No P(s’|s,a) or R(s) Percepts: s, R(s) Goal: learn the expected utility Uπ(s) . Bellman equations for a fixed policy

11 Passive Reinforcement Learning
Temporal-Difference Learning TD Equation: . α is the learning rate

12 Active Reinforcement Learning
π, the agent’s policy, must be learned Must learn complete model: Passive-ADP-Agent Transition model P(s’|s,a) Learn optimal action a ?

13 Active Reinforcement Learning
Learn optimal action a Exploration vs. exploitation Exploitation (greedy agent): Maximize reward under current policy Likely to stick roughly to the first actions that eventually led to success E.g., (1,1)-.04~(2,1)-.04~(3,1)-.04~ (3,2)-.04~(3,3)-.04~~(4,3)+1 Exploration: Test policies assumed to be suboptimal Stay in comfort zone vs. seek a better life ?

14 Active Reinforcement Learning
Learn optimal action a f(u,n): the exploration function Greed f(u) traded off against curiosity f(n) R+: optimistic estimate of best possible reward Ne: constant parameter  Agent will tries each action–state pair at least Ne times ?

15 Active Reinforcement Learning
Learning an action-utility function Q-Learning Q(s,a): value of action a in state s TD agents that learn a Q-function do not need a model of P(s’|s,a) either for learning or for action selection


Download ppt "Artificial Intelligence Ch 21: Reinforcement Learning"

Similar presentations


Ads by Google