Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Advertisements

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Decision Support and Artificial Intelligence Jack G. Zheng July 11 th 2005 MIS Chapter 4.
METAGAMER: An Agent for Learning and Planning in General Games Barney Pell NASA Ames Research Center.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
Dialogue Policy Optimisation
Neural networks Introduction Fitting neural networks
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
UNIVERSITY OF JYVÄSKYLÄ Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master’s Thesis by Joni Töyrylä
Adversarial Search Chapter 5.
Application of Artificial intelligence to Chess Playing Capstone Design Project 2004 Jason Cook Bitboards  Bitboards are 64 bit unsigned integers, with.
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.
Reinforcement Learning
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
CEC 2006 Othello Competition Simon M. Lucas Computer Science Dept, University of Essex Thomas P. Runarsson Science Institute, University of Iceland.
Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation Simon M. Lucas Computer Science Dept, University of Essex Thomas.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
double AlphaBeta(state, depth, alpha, beta) begin if depth
Part I: Classification and Bayesian Learning
Radial Basis Function (RBF) Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs.
What is Artificial Intelligence? AI is the effort to develop systems that can behave/act like humans. Turing Test The problem = unrestricted domains –human.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Using Genetic Programming to Learn Probability Distributions as Mutation Operators with Evolutionary Programming Libin Hong, John Woodward, Ender Ozcan,
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.
Hybrid AI & Machine Learning Systems Using Ne ural Network and Subsumption Architecture Libraries By Logan Kearsley.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Chapter 9 Neural Network.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
Hybrid AI & Machine Learning Systems Using Ne ural Networks and Subsumption Architecture By Logan Kearsley.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
For games. 1. Control  Controllers for robotic applications.  Robot’s sensory system provides inputs and output sends the responses to the robot’s motor.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Pattern Recognition with N-Tuple Systems Simon Lucas Computer Science Dept Essex University.
Soft Computing Lecture 19 Part 2 Hybrid Intelligent Systems.
Evolutionary Programming
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Pac-Man AI using GA. Why Machine Learning in Video Games? Better player experience Agents can adapt to player Increased variety of agent behaviors Ever-changing.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Program By James Mannion Computer Systems.
Artificial Muscle based on Flexinol motor wire
ConvNets for Image Classification
How do you get here?
Genetic Algorithms An Evolutionary Approach to Problem Solving.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Game Playing Why do AI researchers study game playing?
Machine Learning Supervised Learning Classification and Regression
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Daniil Chivilikhin and Vladimir Ulyantsev
Neuro-Computing Lecture 4 Radial Basis Function Network
Reinforcement Learning
CSSE463: Image Recognition Day 18
CSSE463: Image Recognition Day 18
Heuristic Search in Empire-Based Games
RHEA Enhancements for GVGP
Presentation transcript:

Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Overview Games: dynamic, uncertain, open-ended – Ready-made test environments – 21 billion dollar industry: space for more machine learning… Agent architectures – Where the Computational Intelligence fits – Interfacing the Neural Nets etc – Choice of learning machine (WPC, neural network, NTuple systems) Training algorithms – Evolution / co-evolution – TDL – Hybrids Methodology: strong belief in open competitions

My Angle Machine learning – How well can systems learn – Given complex semi-structured environment – With indirect reward schemes

Sample Games Car Racing Othello Ms Pac-Man – Demo

Agent Basics Two main approaches – Action selector – State evaluator Each of these has strengths and weaknesses For any given problem, no hard and fast rules – Experiment! Success or failure can hinge on small details!

Co-evolution Evolutionary algorithm: rank them using a league

(Co) Evolution v. TDL Temporal Difference Learning – Often learns much faster – But less robust – Learns during game-play – Uses information readily available (i.e. current observable game-state) Evolution / Co-evolution (vanilla form) – Information from game result(s) – Easier to apply – But wasteful Both can learn game strategy from scratch

In Pictures…

Simple Example: Mountain Car Often used to test TD learning methods Accelerate a car to reach goal at top of incline Engine force weaker than gravity (DEMO)

State Value Function Actions are applied to current state to generate set of future states State value function is used to rate these Choose action that leads to highest state value Discrete set of actions

Action Selector A decision function selects an output directly based on current state of system Action may be a discrete choice, or continuous outputs

TDL – State Value Learned

Evolution : Learns Policy, not Value

Example Network Found by NEAT+Q (Whiteson and Stone, JMLR 2006) EvoTDL Hybrid They used a different input coding So results not directly comparable

~Optimal State Value Policy Function f = abs(v)

Action Controller Directly connect velocity to output Simple network! One neuron! One connection! Easy to interpret! vs

Othello With Thomas Runarsson, University of Iceland

Volatile Piece Difference moveMove

Setup Use weighted piece counter – Fast to compute (can play billions of games) – Easy to visualise – See if we can beat the ‘standard’ weights Limit search depth to 1-ply – Enables billions of games to be played – For a thorough comparison Focus on machine learning rather than game-tree search Force random moves (with prob. 0.1) – Get a more robust evaluation of playing ability

Standard “Heuristic” Weights (lighter = more advantageous)

CEL Algorithm Evolution Strategy (ES) – (1, 10) (non-elitist worked best) Gaussian mutation – Fixed sigma (not adaptive) – Fixed works just as well here Fitness defined by full round-robin league performance (e.g. 1, 0, -1 for w/d/l) Parent child averaging – Defeats noise inherent in fitness evaluation

TDL Algorithm Nearly as simple to apply as CEL public interface TDLPlayer extends Player { void inGameUpdate(double[] prev, double[] next); void terminalUpdate(double[] prev, double tg); } Reward signal only given at game end Initial alpha and alpha cooling rate tuned empirically

TDL in Java

CEL (1,10) v. Heuristic

TDL v. Random and Heuristic

TDL + CEL v. Heuristic (1 run)

Can we do better? Enforce symmetry – This speeds up learning Use trusty old friend: N-Tuple System

NTuple Systems W. Bledsoe and I. Browning. Pattern recognition and reading by machine. In Proceedings of the EJCC, pages , December Sample n-tuples of input space Map sampled values to memory indexes – Training: adjust values there – Recognition / play: sum over the values Superfast Related to: – Kernel trick of SVM (non-linear map to high dimensional space; then linear model) – Kanerva’s sparse memory model – Also similar to Buro’s look-up table

Symmetric N-Tuple Sampling

3-tuple Example

N-Tuple System Results used 30 random n-tuples Snakes created by a random 6-step walk – Duplicates squares deleted System typically has around weights Simple training rule:

NTuple System (TDL) total games = 1250

Learned strategy…

Web-based League (snapshot before CEC 2006 Competition)

Results versus CEC 2006 Champion (a manual EVO / TDL hybrid)

N-Tuple Summary Stunning results compared to other game- learning architectures such as MLP How might this hold for other problems? How easy are N-Tuples to apply to other domains?

Screen Capture Mode: Ms Pac-Man Challenge

Robotic Car Racing

Conclusions Games are great for CI research – Intellectually challenging – Fun to work with Agent learning for games is still a black art Small details can make big differences! – Which inputs to use Big details also! (NTuple versus MLP) Grand challenge: how can we design more efficient game learners? EvoTDL hybrids are the way forward.

CIG 2008: Perth, WA;