Download presentation
Presentation is loading. Please wait.
1
Distributed Q Learning Lars Blackmore and Steve Block
2
Contents – to be removed What is Q-learning? –MDP framework –Q-learning per se Distributed Q-learning discuss how Sharing Q-values why interesting? Simple averaging (no good) Expertness based distributed Q-learning Expertness w/ specialised agents (optional)
3
Markov Decision Processes Framework: MDP –States S –Actions A –Rewards R(s,a) –Transition Function T(s,a,s’) Goal: find optimal policy *(s) G 100 0 0 0 G
4
Reinforcement Learning Want to find * through experience –Reinforcement Learning –Intuitively similar to human/animal learning –Use some policy for motion –Converge to the optimal policy * An algorithm for reinforcement learning…
5
Q-Learning Define Q*(s,a): –“Total reward if agent is in state s, takes action a, then acts optimally forever” Optimal policy: *(s)=argmax a Q*(s,a) Q(s,a) is an estimate of Q*(s,a) Q-learning motion policy: (s)=argmax a Q(s,a) Update Q recursively: Optimality theorem: –“if each (s,a) pair is updated an infinite number of times, Q converges to Q* with probability 1”
6
Distributed Q-Learning Problem formulation Different approaches –Expanding state (share sensor information) –Sharing experiences –Share Q-values Experimental results? –If yes, have to explain setup… Sharing Q-values –Explain why most interesting
7
Sharing Q-values First approach: Simple Averaging Learning framework –Individual learning for t i trials –Each trial starts from a random state and ends when robot reaches goal –Next, all robots switch to cooperative learning Result: Simple Averaging is worse in general!
8
Why is Simple Averaging Worse? Slower learning rate: –Example: First robot to find the goal (at time t) Insensitive to environment changes: –First robot to find the change RobotQ(s,a) at tQ(s,a) at t+1Q*(s,a) 110025100 2025100 3025100 4025100
9
Expertness Idea: pay more attention to agents who are ‘experts’ –Expertness based cooperative Q-learning New Q-sharing equation: Agent i weights agent j’s Q value based on their relative expertness e i and e j
10
Expertness Measures Need to define expertness of agent j –Based on the reinforcement agent j has encountered Alternative definitions: –Simple Sum –Abs –Positive –Negative Different interpretations
11
Weighting Strategies How do we come up with weights based on the expertnesses? Alternative strategies: –‘Learn from all’: –‘Learn from experts’:
12
Experimental Setup Hunter-prey scenario Individual trial phase as before –Different number of trials for each agent Then cooperative phase
13
Results Cooperative vs. individual Different strategies Interpretation Conclusion – Expertness based methods are good if expertness significantly different.
14
Specialised Agents Agent i may have explored area A a lot but area B very little –What is agent i’s expertness? –Agent i is an expert in area A but not in area B Idea: –Agents can be specialised,i.e. experts in certain areas of the world –Pay more attention to Q-values from agents which are experts in that area
15
Specialised Agents Continued
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.