Download presentation
Presentation is loading. Please wait.
1
Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation
2
Pass Evaluation zpassing requires action by two agents: The receiver‘s task is identical to that of the defender in Chapter 5 --› use the learned ball-interception skill zIt‘s easier to train a pass-evaluation function than to code such a function by hand zcollecting data and using it to train the agents
3
Decision Tree Learning zusing the C4.5 training algorithm zwhen many features are available zdetermining the relevant features zhandle missing features (i.e. player not visible) zassessing the likelyhood that a pass will succeed
4
Training zconstrained training scenario zomnipotent agent monitors the trials ztraining examples do not include full teams z5000 training examples z174 features (passer and receiver) zthe features from receiver‘s perspective are communicated to the passer
5
z1. The players are placed randomly within a region z2. The passer announces its intention to pass z3. The teammates reply with their views of the field when ready to receive z4. The passer chooses a receiver randomly during training, or with a DT during testing z5. The passer collects the features of the training instance z6. The passer announces to whom it is passing z7. The reveiver and four opponents attempt to get the ball z8. The training example is classified as a success if the receiver manages to advance the ball towards the opponent‘s goal; a failure if one of the opponents clears the ball in the opposite direction; or a miss if the receiver and the opponents all fail to intercept the ball The Training Procedure
6
The Features
7
The trained Decision Tree zPruned tree with 87 nodes z51% successes, 42% failures, 7% misses z26% error rate on the training set Function Φ(passer, receiver) -› [-1,1] the DT predicts class κ with confidence γ є [0,1] γ if κ = S (success) Φ(passer, receiver) = 0 if κ = M (miss) -γ if κ = F (failure)
9
Testing zFor testing the DT chooses the receiver zother steps are the same as during training zif more then one is classified to be successful it passes to the teammate with maximum Φ(passer, teammate) zthe passer passes in every case
10
Results zSuccess rate without opponents is 86% zSuccess rate when passing to the closest teammate is 64%
11
Using the Learned Behaviors
12
Scaling up to Full Games zExtent basic learned behaviors into a full multiagent behavior (designed for testing) zThe player needs to have some mechanism when it does not have the ball zIs there enough time to execute the ideal pass?
13
RCF Receiver Choice Function zWhat should I do if I have the ball? zInput: the agents perception of the current state zOutput: an action (dribble, kick or pass) and a direction (i.e. towards the goal) zFunction: the RCF identifies a set of candidate receivers. Then the RCF select a receiver or dribble or kick
14
Three RCFs: PRW, RAND, DT zPRW: prefer right wing RCF: Uses a fixed ordering on the candidate receivers zRAND: random RCF: It chooses randomly from among all candidate receivers zDT: decision tree RCF: if the DT does not predict that any pass will succeed the agent with the ball should dribble or kick
15
Player Positions
16
Specification of the RCF DT z1. Determines the set C of candidat receivers z2. Eliminate receivers that are closer than 10 or farther than 40 z3. Eliminate receivers that are away from their home position z4. When there is an opponent within 15, then eliminate receivers to which the passer cannot kick directly (+/- 130°) z5. IF C = Ø THEN yIF opponent < 15 THEN return KICK yELSE return DRIBBLE z6. ELSE eliminate receivers with Φ (passer, receiver)<=0 yIF C = Ø THEN return kick or dribble (as in step 5) yELSE return pass to the receiver with max Φ(passer, receiver)
17
Reasoning about Action Execution Time zno turnball behavior z5 - 15 simulator cycles to move out of the ball’s path zopponent can steal the ball z--› reasoning about the available time
18
The RCF in a Behavior zRCF: only when the ball is within kickable- area z1. Find the ball’s location (after 3 seconds without seeing the ball the player don’t know the ball’s location) zuse NN to intercept the ball zwhen not chasing the ball --› ball-dependent flexible positioning
19
Complete Agent Behavior zd chase =10
20
Testing zBehaviors differ only in their RCF’s z4-3-3 formation (makes passing useful) zuse only the ball-dependent player- positioning algorithm --› every player is covered by one opponent zin reality some players are typically more open then others --› test the RCF’s against the OPR (only play right - formation)
21
Results z34 five-minutes games
24
Action-Execution Time zAssumption: there is never an opponent within d min --› No rush DT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.