Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science
Simulated Soccer How does agent decide what to do with the ball? Complexities Continuous inputs High dimensionality
Reinforcement Learning (RL) Learning to associate utility values with state- action pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig)
Problems State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer…
Quantization Divide State Space into regions of interest Tile Coding (Sutton & Barto, 1998) No automated method for regions granularity Heterogeneity location Prefer a learned abstraction of state space
Kohonen Networks Clustering algorithm Data driven Voronoi Diagram Agent near opponent goal Teammate near opponent goal No nearby opponents
State Space Reduction 90 continuous valued inputs describe state of a soccer game Naïve discretization 2 90 states Filter out unnecessary inputs still 2 18 states Clustering algorithm only 5000 states Big Win!!!
Two Pass Algorithm Pass 1: Use Kohonen Network and large training set to learn state space Pass 2: Use Reinforcement Learning to learn utilities for states (SARSA)
Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line?
Unresolved Issues Increased generalization leads to frequency aliasing… This becomes a sampling problem… vs.Few samplesMany samples Example: Riemann Sum
Aliasing & Sampling Utility function not band limited How can we sample to reduce error? Uniformly increase sampling rate? (not the best idea) Adaptively super sample? Choose sample points based on special criteria?
Forcing Functions Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced reward for state-action pair How do we create such a forcing function?
Results Evaluate three systems Control – Random action selection SARSA Forcing Function Evaluation criteria Goals scored Time of possession
Cumulative Score
Time of Possession
Team with Forcing Functions
With Forcing vs. Without
Summary Two-Pass learning algorithm for simulated soccer State space abstraction is automated Data driven technique Improved state of the art for simulated soccer
Future Work Learned distance metric Additional automation in process Better generalization