Download presentation
Presentation is loading. Please wait.
Published byBryan Elijah Briggs Modified over 9 years ago
1
Top level learning Example: Pass selection using TPOT-RL
2
Overview Team with 3 players + 1 Enemy goal Goalie, Midfield, Forward actions state generalization Q-Table action selection reduction generation (value function learning by rewards)
3
actions GM F E KickEPassG PassF Szenario (state s): 3 Players (opponents are invisible) 3 Actions (a) ?
4
action dependent features GM F E 3=0.2 1=0.9 2=0.7 e : S x A -> U (s,a)... Pass Evaluation Function (Decision Tree, Heuristic,..) short term efficiency sufficent for achievig goal?
5
state generalization GM F E 3=0.2 1=0.9 2=0.7 create a small feature vector describing situation f : S -> V f(s) = v = v =
6
state generalization II GM F E 0.2 (F)0.9 (T) 0.7 (T) discretize values: e.g. >= 0.7... Pass successfull (v i = True) < 0.7... Pass missed (v i = False) v =
7
state generalization III GM F E original statespace space (10 198? ) reduced statespace: 2 3.3 = 24 („real“ robosoccer 2 (11+8). 11 = ca. 5.7 Mio.) v =
8
action selection GM F E optimal action to score a goal (long term) ???? ? assume a wise Q-Table max. Q-value => best action !
9
Q-Table action selection M PassF v1v1 v2v2 v3v3 PassGPassFKickE Actions a state v FFF... TTF TTT 022 4128... 41020 v = Q-Table for Player M Take action with max. Q-value (Q max = 100)
10
Q-Table action selection II v1v1 v2v2 v3v3 PassGPassFKickE Q-Table for Player M FFF... TTF TTT 022 4128... 81020 Problem: size of Q-Table * Teamsize * Formations e.g. „real“ robosoccer: #Q-values = 2 (11+8). (11+8). 11. 9 = ca. 10 9 ASSUME independency Q(v,a i ) depends only on v i
11
Q-Table reduction M v1v1 v2v2 v3v3 FFF... TTF TTT e.g. Q-Values for PassF depend „only“ on the FeatureValue for PassF PassGPassFKickE 022... 4128... 81020 PassF: 2=0.7,T
12
Q-Table reduction II M e.g. „real“ robosoccer: #Q-values = 2.(11+8).(11+8).11.9 = 71478 v1v1 PassGPassFKickE F T... v1v1 v2v2 PassGPassFKickE F T 022 4128 v2v2 v3v3 PassGPassFKickE F T... v3v3 Q-Table for Player M
13
Q-Table reduction III further reduction possible: action filter B(s) -> no Q-Values for useless actions (e.g. in front of enemy goal: q-value for passing back to own goalie) v1v1 PassGPassFKickE F T... v1v1 v2v2 PassGPassFKickE F T 022 4128 v2v2 v3v3 PassGPassFKickE F T... v3v3 M Q-Table for Player M
14
generating a Q-Table Value function learning by rewards G M F E PassF time t = 7 M passes to F +100 G M F E KickE t = 15: F kicks to goal t = 17: goal scored!!
15
generating a Q-Table II Every Agent reminds of Last Action M PassF +100 F KickE t = 15 F kicks to goal t = 17 goal scored!! -> Q = 100 t = 7 M passes to F r = Q/(k. t*) = 100/(0.5.10) = 20 Rewards r = Q =100 = 100 t = 0-2.. intermediate
16
generating a Q-Table III Update Q-Tables M PassF Q(v,a) = Q(v,a) + *(r–Q(v,a)) = 12 + 0.1*(20-12) = 12.8 Reward r = 20 v2v2 PassGPassFKickE F T 022 4128 v2v2 v2v2 PassGPassFKickE F T 022 412.88 v2v2
17
future-value PassF is senseful M PassF F KickE Q(v,a) = Q(v,a) + .(r + y.Q(vf,af) – Q(v,a)) = PassF is very senseful, if F can kick
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.