Download presentation
Presentation is loading. Please wait.
Published byGabriella Walters Modified over 9 years ago
1
Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning
2
Lyle H Ungar, University of Pennsylvania 2 Learning Levels Darwinian Trial -> death or children Skinnerian Reinforcement learning Popperian Our hypotheses die in our stead Gregorian Tools and artifacts
3
Lyle H Ungar, University of Pennsylvania 3 Machine Learning Unsupervised Cluster similar items Association (no “right” answer) Supervised For observations/features, teacher gives the correct “answer” E.g., Learn to recognize categories Reinforcement Take action, observe consequence bad dog!
4
Lyle H Ungar, University of Pennsylvania 4 Pavlovian Conditioning Pavlov Food causes salivation Sound before food -> sound causes salivation Learn to associate sound with food
5
Lyle H Ungar, University of Pennsylvania 5 Operant Conditioning
6
Lyle H Ungar, University of Pennsylvania 6 Associative Memory Hebbian Learning When two connected neurons are both excited, the connection between them is strengthened Neurons that fire together, wire together
7
Lyle H Ungar, University of Pennsylvania 7 Explanations of Pavlov S-S (stimulus-stimulus) Dogs learn to associate sound with food (and salivate based on “thinking” of food) S-R (stimulus-response) Dogs learn to salivate based on the tone (and salivate directly without “thinking” of food) How to test? Do dogs think lights are food?
8
Lyle H Ungar, University of Pennsylvania 8 Conditioning in humans Two pathways The “slow” pathway dogs use Cognitive (conscious) learning How to test this hypothesis Learn to blink based on a stimuli associated with a puff of air.
9
Lyle H Ungar, University of Pennsylvania 9 BlockingBlocking Tone -> Shock -> Fear Tone -> Fear Tone + Light -> Shock -> Fear Light -> ?
10
Lyle H Ungar, University of Pennsylvania 10 Rescorla-Wagner Model Hypothesis: learn from observations that are surprising V n <- V n + c (V max - V n ) V n = c (V max - V n ) V n is strength of association between US and CS c is the learning rate Predictions contingency
11
Lyle H Ungar, University of Pennsylvania 11 Limitations of Rescorla- Wagner Tone -> food Light -> food Tone + light -> ?
12
Lyle H Ungar, University of Pennsylvania 12 Reinforcement Learning Many times one takes a long sequence of actions, and only discovers the result of these actions later (e.g. when you win or lose a game) Q: How can one ascribe credit (or blame) to one action is a sequence of actions A: by noting surprises
13
Lyle H Ungar, University of Pennsylvania 13 Consider a game Estimate probability of winning Take an action, see how the opponent (or the world) responds Re-estimate probability of winning If it is unchanged, you learned nothing If it is higher, the initial state was better than you thought If it is lower, the state was worse than you thought
14
Lyle H Ungar, University of Pennsylvania 14 Tic-tac-toe example Decision tree Alternate layers give possible moves for each player
15
Lyle H Ungar, University of Pennsylvania 15 Reinforcement Learning State E.g. board position Action E.g. move Policy State -> Action Reward function State -> utility Model of the environment State, action -> state
16
Lyle H Ungar, University of Pennsylvania 16 Definitions of key terms State What you need to know about the world to predict the effect of an action Policy What action to take in each state Reward function The cost or benefit of being in a state (e.g. points won or lost, happiness gained or lost)
17
Lyle H Ungar, University of Pennsylvania 17 Value Iteration Value Function Expected value of a policy over time = sum of the expected rewards V(s) <- V(s) + c[V(s’) - V(s)] s = state before the move s’ = state after the move “temporal difference” learning
18
Lyle H Ungar, University of Pennsylvania 18 Mouse in Maze Example policy value function
19
Lyle H Ungar, University of Pennsylvania 19 Dopamine & Reinforcement
20
Lyle H Ungar, University of Pennsylvania 20 Exploration - Exploitation Exploration Always try a different route to work Exploitation Always take the best route to work that you have found so far Learning requires exploration Unless the environment is noisy
21
Lyle H Ungar, University of Pennsylvania 21 RL can be very simple Simple learning algorithm leads to optimal policy Without predicting the effects of the agents actions Without predicting immediate payoffs Without planning Without explicit model of the world
22
Lyle H Ungar, University of Pennsylvania 22 How to play chess Computer Evaluation function for board positions Fast search Human (grandmaster) Memorize tens of thousands of board positions and what do to Do a much smaller search!
23
Lyle H Ungar, University of Pennsylvania 23 AI and Games Chess Backgammon Deterministic Stochastic Position Policy evaluation + search
24
Lyle H Ungar, University of Pennsylvania 24 Scaling up value functions For small number of states Learn the value function of each state Not possible for Backgammon 10 20 states Learn mapping from features to value Then use reinforcement learning to get improved value estimates
25
Lyle H Ungar, University of Pennsylvania 25 Q-learningQ-learning Instead of the Value of a state, learn the value Q(s,a) of taking an action a from a state s. Optimal policy: take best action max a Q(s,a) Learning rule Q(s, a) <- Q(s, a) + c[r t + max b Q(s’, b) - Q(s, a)]
26
Lyle H Ungar, University of Pennsylvania 26 Learning to Sing Zerbra Finch hears father’s song Memorizes it Then practices for months to learn to reproduce it What kind of learning is this?
27
Lyle H Ungar, University of Pennsylvania 27 Controversies?Controversies? Is conditioning good? How much learning do people do? Innateness, learning, and free will
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.