©Agent Technology, 2008, Ai Lab NJU Agent Technology Robocup
2 Oct. 2008©Gao Yang, AI Lab, NJU Outline 1. Overview of Robocup simulation 2. Demo 3. Team member Architecture Overview 4. Multi-agent Layered Learning in Robocup – building increasingly complex behaviors on top of one another; – learning low-level behaviors before high-level ones; – Higher-level behaviors utilize lower-level ones as components 5. Individual learning 6. Multi-agent learning 7. Team learning
3 Oct. 2008©Gao Yang, AI Lab, NJU 1. Background Initially started as the J-League (Japan Robot Soccer League) 1993, several American researchers became interested – bringing about the Robot World Cup Initiative (RoboCup) The first games and conferences took place in 1997 The RoboCup Federation was started to coordinate the research through workshops, conferences and yearly competitions Tsinghuaeolus: Champion of Robocup 2001, 2002
4 Oct. 2008©Gao Yang, AI Lab, NJU Robocup Official Site
5 Oct. 2008©Gao Yang, AI Lab, NJU Robocup Official Site
6 Oct. 2008©Gao Yang, AI Lab, NJU Motivation Goal was to provide a new standard problem for AI – jokingly called “The life of AI after Deep Blue” Robocup differs by focusing on a distributed solution rather than centralized Robocup solves problem in an complex environment: Dynamic, Noisy, Semi-oberservable, Corporations & Competitions Simulation,Small League,Medium League to Real robot
7 Oct. 2008©Gao Yang, AI Lab, NJU Robocup Soccer Simulator
8 Oct. 2008©Gao Yang, AI Lab, NJU Challenges Dynamic Limited communication: UDP, limited shared bandwidth, delayed Real-time actions Tremendous state space Noisy Multi-agent: cooperation & competition Semi-observable & limited perception
9 Oct. 2008©Gao Yang, AI Lab, NJU 2. Demo
10 Oct. 2008©Gao Yang, AI Lab, NJU Robocup system overview Provides a platform to develop software techniques, without the necessity of creating physical robots Consists of three main applications: – Soccer Server – Soccer Monitor – Soccer Player(s) - agents
11 Oct. 2008©Gao Yang, AI Lab, NJU Soccer Server A system allowing several autonomous program agents to play a virtual soccer game The games are carried out in a client/server style – where each client = one player The communication used is UDP/IP, and can therefore be used over a network connection, or even the Internet
12 Oct. 2008©Gao Yang, AI Lab, NJU Soccer Server - Object
13 Oct. 2008©Gao Yang, AI Lab, NJU Soccer Server - Protocol Client Command protocol – Connecting – Disconnecting – reconnecting Client Control protocol – Catch – change_view – Dash – Kick – Move – Say – sense_body – Score – Turn – turn_neck – error Client Sensor protocol – hear – see – sense_body Sensor Model – Aural Sensor Model – Vision Sensor Model – Body Sensor Model
14 Oct. 2008©Gao Yang, AI Lab, NJU Soccer Server - Protocol (cont.) Movement Model – Movement Noise Model – Collision Model Action Model – Catch Model – Dash Model – Kick Model – Move Model – Say Model – Turn Model – Turn neck Model Referee Model
15 Oct. 2008©Gao Yang, AI Lab, NJU Soccer Monitor Used to display the visual progress of the game Several Monitors can be connected to the server It can also be used to interrupt game play by doing simple tasks such as dropping a ball Log player
16 Oct. 2008©Gao Yang, AI Lab, NJU Autonomous Players Are the “brains” of the players Receive sensory information from the server, upon which decisions are made Commands are formatted and sent to the server using UDP sockets Coach Players
17 Oct. 2008©Gao Yang, AI Lab, NJU 3. MAS Design problem Reactive vs. Deliberative Local vs. Global Stable vs. Evolving Modeling & Affect other Agents Distributed sensing Opponent modeling
18 Oct. 2008©Gao Yang, AI Lab, NJU MAS in Robocup Deliberative & reactive layered architecture Global Online increment learning Hidden states & prediction Limited & unreliable communication Hidden state
19 Oct. 2008©Gao Yang, AI Lab, NJU Architecture
20 Oct. 2008©Gao Yang, AI Lab, NJU Architecture (cont.) Perception – receive information from server. World-state – external state, perception + prediction Internal state – BDI state Predictor Analysis & Decision Lock-room agreement – communication, formation, role Command Parser – Parse primitive actions to server command
21 Oct. 2008©Gao Yang, AI Lab, NJU Teamwork Structure Periodic Team Synchronization – Team goal – Periodic synchronization with unlimited communication – Ordinary unreliable communication Lock-room agreement Challenge – How to represent and follow lock-room agreement – How to determine when to change formation/role – How to ensure all agents use the same formation – How to ensure all roles be filled
22 Oct. 2008©Gao Yang, AI Lab, NJU Teamwork Structure (cont.) Roles – Soft constraint – Flexible vs. Rigid Formations – Decompose entire task to roles – Dynamic vs. Static Set-plays – Multi-step & multi-agent – Roles mapping
23 Oct. 2008©Gao Yang, AI Lab, NJU Communication Paradigm Limited communication – Single channel, low bandwidth, unreliable Challenge – How to identify message – How to prevent mimic message – How to prevent “talking all at once” – How to make agent robust to lost message – How to maximize the communication chance
24 Oct. 2008©Gao Yang, AI Lab, NJU Communication Paradigm ( cont. ) Message targeting and distinguishing – Target: agent or role Robustness to active interface – Time-stamp Multiple simultaneous responses – Single vs. Team – Response vs. no response Robustness to lost message Team coordination
25 Oct. 2008©Gao Yang, AI Lab, NJU 4. Layered Learning Bottom-up incremental task decomposition Domain specified Machine learning occurs separately – Off-line – On-line Affect each other Error propagation
26 Oct. 2008©Gao Yang, AI Lab, NJU Layered Learning ( cont. )
27 Oct. 2008©Gao Yang, AI Lab, NJU Layered Learning in Robocup (overview) Like human players: several levels – kick,shoot, dribble, pass,position etc. – Different training method: training & adapt Instance in Robocup 1Ball interception, shootSingle agent 2pass evaluationMulti-agent 3Pass selectionteam 4Role assignmentteam 5positionadversarial
28 Oct. 2008©Gao Yang, AI Lab, NJU 4. Low-level skill: Ball interception Intercepting a moving ball a prerequisite for any kicking action Learning through supervised empirical learning: Neural Networks Difficulties: – Ball movement unpredictable due to noise in the system – Players have limited vision
29 Oct. 2008©Gao Yang, AI Lab, NJU Low-level skill: Ball interception (cont.) Defender learns turn angle (TA—the angle to turn after facing the ball), based on the ball’s distance and ball angle. NN training – randomized situations and defender actions – Results recorded SAVE: ball is intercepted GOAL: ball is not intercepted, ball goes into the goal MISS: ball is not intercepted, no goal made
30 Oct. 2008©Gao Yang, AI Lab, NJU 5. Mid-level skill: passing evaluation Passing evaluation involves two team members: a passer and a receiver. The receiver’s action in passing is identical to the defender’s action in ball interception. The same NN is used. Decisions Trees (DTs) were used to decide if the ball should be passed to a particular teammate.
31 Oct. 2008©Gao Yang, AI Lab, NJU Mid-level skill: passing evaluation (cont.) Passer uses the receivers’ views of the field in addition to its own when making the decision whether to pass. During training, passer chose a random receiver in each trial. In addition to the intended receiver, 4 defenders also attempted to intercept the ball. Results: – SUCCESS: the intended receiver intercepts the ball – FAILURE: a defender intercepts the ball – MISS: no one intercepted the ball
32 Oct. 2008©Gao Yang, AI Lab, NJU High-level skill: passing selection Pass selection : select receiver according long-term affect. Rule based selection: Local optimal – Backward pass – Multi-step pass
33 Oct. 2008©Gao Yang, AI Lab, NJU High-level skill: passing selection (cont.) Challenge – Tremendous information Self state teammates opponent – Long-term performance – Depend others – Partitioned among teammates Reinforcement learning – Trial and error – Online & Unsupervised – Dealing with delayed rewards – Dealing with large state space
34 Oct. 2008©Gao Yang, AI Lab, NJU High-level skill: passing selection (cont.) RL in Robocup – value function is partitioned among the team – Action-depend features – Long-term rewards from environment Steps – State generalization : – Value function learning: – Action selection:
35 Oct. 2008©Gao Yang, AI Lab, NJU Tactical level skills – behavior tree
36 Oct. 2008©Gao Yang, AI Lab, NJU Tactical level skills (cont.) Assuming behaviors have been learned, how to determine a behavior under specific condition For example: An attacker having ball must decide – Pass to one of 10 teammate – Dibble in 8 directions – Shoot – Hold ball
37 Oct. 2008©Gao Yang, AI Lab, NJU Tactical level skills (cont.) Observation Reinforcement learning – Not know others behavior but the result of the behavior – Long-term global performance State generalization – Using feature space instead of state space to reduce complexity – Or using neural network as function approximation to store value function Value function learning – 1 for goal, -1 for lose, for one step Action selection
38 Oct. 2008©Gao Yang, AI Lab, NJU 6. Team-level strategies I –Set play A set play was run several times to show that the higher- level learning (passing decision) could be incorporated into a game situation. Goal: keep layering new higher-level learned behaviors on top of old ones, resulting in a high-level functioning team that is robust and reliable in game situations. Challenge – When to start set play – How to ensure set play – When to terminate set play
39 Oct. 2008©Gao Yang, AI Lab, NJU Team-level strategies I –Set play ( cont.) Pre-Planned Set-Plays – Lock-room agreement – Set-play roles – Using to fill set play roles – Act base on low level action
40 Oct. 2008©Gao Yang, AI Lab, NJU Team-level strategies II – Dynamic formation Dynamic formation – Strategic Positioning Based Situation (SBSP) According to several factors – score & time – Opponent formation – Statistical information Solution – Pre-defined rules – Coach direct – Machine learning ?
41 Oct. 2008©Gao Yang, AI Lab, NJU Team-level strategies III- Coach Coach – Having Global information – Only send delayed message to all players Analysis strategical situation
42 Oct. 2008©Gao Yang, AI Lab, NJU Other mechanism not mentioned Dribble: Advance in the field, to a given position, keeping the ball always controlled Hold: Keep the ball controlled, avoiding opponents without moving Ball Passive Interception : Intercept the ball, not in the fastest point but in a more advantageous point, although taking more time; Mark Pass Line : Mark a pass line from the opponent that is (or will be) in control of the ball, to another opponent (that is in a better position). Marking is performed if a player detects a useful and uncovered passing line that may be well marked (better than any other teammate) in a point near that player strategic position; Approach Ball Position : Although interception is not possible, the player approaches ball position in order to reduce opponent options; Mark Opponent : Player mark the opponent that has the ball keeping him from advancing in the field; Cover Goal - Player tries to get a good defensive position by keeping his position between ball position and its own goal
43 Oct. 2008©Gao Yang, AI Lab, NJU Implementation Synchronize (main loop) – Sense and act properly – Using signal to maximize “thinking” World model – Noise and limited sense – Using confidence to “predict” Class Hierarchy – Not according to layered learning – class Behaviors : public Interception, public VisualSystem, public Pass, public Positioning, public AuditoryProcess, public Dribble, public Shoot, public Goalie, public ClearBall, public Handleball
44 Oct. 2008©Gao Yang, AI Lab, NJU Implementation (cont.) Int behave(){ /********* pre_process *****************/ mediator.ResetBuffer(); situation.JudgeSituation(); motion.Reset(); if (Self.Is_goalie){ switch(situation.JudgeGameState() ){ case GS_Before_Kick_Off : /* move to right positions and face ball */ fm.beforekickoff.going(); break; case GS_Their_PlaceKick : motion.GoalieDefense(); break; case GS_My_PlaceKick : fm.goalieplacekick.going(); break; case GS_Playing : motion.Smartgoalie(); break; case GS_Other : default : break; } motion.DoVisualDecision(); mediator.mediation(); motion.Communication(); } else{ switch(situation.JudgeGameState() ){ case GS_Before_Kick_Off : fm.beforekickoff.going();break; case GS_Their_PlaceKick : motion.Defense();break; case GS_My_PlaceKick : fm.placekick.going();break; case GS_Playing : if !motion.Tryhandle()){ if(!motion.SmartInterception()){ motion.Position(); } break; case GS_Other : default :break; } motion.DoVisualDecision(); mediator.mediation(); } motion.Communication(); return 1; }
45 Oct. 2008©Gao Yang, AI Lab, NJU Resource RoboCup Official Site Reinforcement Learning Repository