Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation.

Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Pass Evaluation zpassing requires action by two agents: The receiver‘s task is identical to that of the defender in Chapter 5 --› use the learned ball-interception skill zIt‘s easier to train a pass-evaluation function than to code such a function by hand zcollecting data and using it to train the agents

Decision Tree Learning zusing the C4.5 training algorithm zwhen many features are available zdetermining the relevant features zhandle missing features (i.e. player not visible) zassessing the likelyhood that a pass will succeed

Training zconstrained training scenario zomnipotent agent monitors the trials ztraining examples do not include full teams z5000 training examples z174 features (passer and receiver) zthe features from receiver‘s perspective are communicated to the passer

z1. The players are placed randomly within a region z2. The passer announces its intention to pass z3. The teammates reply with their views of the field when ready to receive z4. The passer chooses a receiver randomly during training, or with a DT during testing z5. The passer collects the features of the training instance z6. The passer announces to whom it is passing z7. The reveiver and four opponents attempt to get the ball z8. The training example is classified as a success if the receiver manages to advance the ball towards the opponent‘s goal; a failure if one of the opponents clears the ball in the opposite direction; or a miss if the receiver and the opponents all fail to intercept the ball The Training Procedure

The Features

The trained Decision Tree zPruned tree with 87 nodes z51% successes, 42% failures, 7% misses z26% error rate on the training set Function Φ(passer, receiver) -› [-1,1] the DT predicts class κ with confidence γ є [0,1] γ if κ = S (success) Φ(passer, receiver) = 0 if κ = M (miss) -γ if κ = F (failure)

Testing zFor testing the DT chooses the receiver zother steps are the same as during training zif more then one is classified to be successful it passes to the teammate with maximum Φ(passer, teammate) zthe passer passes in every case

Results zSuccess rate without opponents is 86% zSuccess rate when passing to the closest teammate is 64%

Using the Learned Behaviors

Scaling up to Full Games zExtent basic learned behaviors into a full multiagent behavior (designed for testing) zThe player needs to have some mechanism when it does not have the ball zIs there enough time to execute the ideal pass?

RCF Receiver Choice Function zWhat should I do if I have the ball? zInput: the agents perception of the current state zOutput: an action (dribble, kick or pass) and a direction (i.e. towards the goal) zFunction: the RCF identifies a set of candidate receivers. Then the RCF select a receiver or dribble or kick

Three RCFs: PRW, RAND, DT zPRW: prefer right wing RCF: Uses a fixed ordering on the candidate receivers zRAND: random RCF: It chooses randomly from among all candidate receivers zDT: decision tree RCF: if the DT does not predict that any pass will succeed the agent with the ball should dribble or kick

Player Positions

Specification of the RCF DT z1. Determines the set C of candidat receivers z2. Eliminate receivers that are closer than 10 or farther than 40 z3. Eliminate receivers that are away from their home position z4. When there is an opponent within 15, then eliminate receivers to which the passer cannot kick directly (+/- 130°) z5. IF C = Ø THEN yIF opponent < 15 THEN return KICK yELSE return DRIBBLE z6. ELSE eliminate receivers with Φ (passer, receiver)<=0 yIF C = Ø THEN return kick or dribble (as in step 5) yELSE return pass to the receiver with max Φ(passer, receiver)

Reasoning about Action Execution Time zno turnball behavior z5 - 15 simulator cycles to move out of the ball’s path zopponent can steal the ball z--› reasoning about the available time

The RCF in a Behavior zRCF: only when the ball is within kickable- area z1. Find the ball’s location (after 3 seconds without seeing the ball the player don’t know the ball’s location) zuse NN to intercept the ball zwhen not chasing the ball --› ball-dependent flexible positioning

Complete Agent Behavior zd chase =10

Testing zBehaviors differ only in their RCF’s z4-3-3 formation (makes passing useful) zuse only the ball-dependent player- positioning algorithm --› every player is covered by one opponent zin reality some players are typically more open then others --› test the RCF’s against the OPR (only play right - formation)

Results z34 five-minutes games

Action-Execution Time zAssumption: there is never an opponent within d min --› No rush DT

Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation.

Similar presentations

Presentation on theme: "Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation.

Similar presentations

Presentation on theme: "Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation."— Presentation transcript:

Similar presentations

About project

Feedback