Applying reinforcement learning to Tetris Researcher : Donald Carr Supervisor : Philip Sterne.

Slides:

Advertisements

Similar presentations

Building a Conceptual Understanding of Algebra with Algebra Tiles

Advertisements

1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.

Reinforcement Learning

Lecture 18: Temporal-Difference Learning

Worksheet I. Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.

RL for Large State Spaces: Value Function Approximation

Tetris – Genetic Algorithm Presented by, Jeethan & Jun.

Tetris and Genetic Algorithms Math Club 5/30/2011.

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.

Pathfinding Basic Methods.

Reinforcement learning (Chapter 21)

Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.

Applying reinforcement learning to Tetris Imp : Donald Carr Guru : Philip Sterne.

Reinforcement learning

Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning

Extending Implicit Negotiation to Repeated Grid Games Robin Carnow Computer Science Department Rutgers University.

Reinforcement Learning (1)

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Chapter 6: Normal Probability Distributions

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Graphs of Frequency Distribution Introduction to Statistics Chapter 2 Jan 21, 2010 Class #2.

Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

ESTIMATES AND SAMPLE SIZES

Introduction Many decision making problems in real life

Othello Artificial Intelligence With Machine Learning

Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Reinforcement Learning for the game of Tetris using Cross Entropy

+ Recitation 3. + The Normal Distribution + Probability Distributions A probability distribution is a table or an equation that links each outcome of.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 2: Temporal difference learning.

Crysten Caviness Curriculum Management Specialist Birdville ISD.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

Is One Better than None? Does foresight allow an artificial intelligence to survive longer in Tetris? William Granger and Liqun Tracy Yang.

Tetris Agent Optimization Using Harmony Search Algorithm

Quoridor and Artificial Intelligence

Lesson 3: Arrays and Loops. Arrays Arrays are like collections of variables Picture mailboxes all lined up in a row, or storage holes in a shelf – You.

Basic Principles of Design. Design Basics Content & Form Content: subject matter, story, or information to be communicated to the viewer. Form: purely.

Reinforcement learning (Chapter 21)

Reinforcement Learning

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Reinforcement Learning and Tetris Jared Christen.

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

An AI Game Project. Background Fivel is a unique hybrid of a NxM game and a sliding puzzle. The goals in making this project were: Create an original.

R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

Learning from each other; locally, nationally & internationally Helping teachers to develop as reflective practitioners.

Mathematics and the Arts

A Comparison of Learning Algorithms on the ALE

Chapter 6: Temporal Difference Learning

Mastering the game of Go with deep neural network and tree search

Backgammon project Oren Salzman Guy Levit Instructors:

Reinforcement learning (Chapter 21)

The Scientific Method.

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Handwritten Characters Recognition Based on an HMM Model

Chapter 9: Planning and Learning

CS 416 Artificial Intelligence

October 20, 2010 Dr. Itamar Arel College of Engineering

Presentation transcript:

Applying reinforcement learning to Tetris Researcher : Donald Carr Supervisor : Philip Sterne

What? Creating an agent that learns to play Tetris from first principles

Why? We are interested in the learning process. We are interested in non-orthodox insight into sophisticated problems

How? Reinforcement learning is a branch of AI that focuses on achieving learning When utilised in the conception of a digital Backgammon player, TD-Gammon, it discovered tactics that have been adopted by the worlds greatest human players

Game plan Tetris Reinforcement learning Project  Implementing Tetris  Melax Tetris  Contour Tetris  Full Tetris Conclusion

Tetris Initially empty well Tetromino selected from uniform distribution Tetromino descends Filling the well results in death Escape route : Forming a complete row leads to row vanishing and structure above complete row shifting down

Reinforcement Learning A dynamic approach to learning  Agent has the means to discover for himself how the game is played, and how he wants to play it, based upon his own experiences.  We reserve the right to punish him when he strays from the straight and narrow  Trial and error learning

Reinforcement Learning Crux Agent  Perceives state of system  Has memory of previous experiences – Value function  Functions under pre-determined reward function  Has a policy, which maps state to action  Constantly updates its value function to reflect perceived reality  Possibly holds a (conceptual) model of the system

Life as an agent Has memory Has a static policy (experiment, be greedy, etc) Perceives state Policy determines action after looking up state in value function (memory) Takes action Agent gets reward (may be zero) Agent adjusts value entry corresponding to state repeat

Reward The rewards are set in the definition of the problem. Beyond control of agent Can be negative or positive : punishment or reward

Value function Represents long term value of state & incorporates discounted value of destination states 2 approaches we adopt  Afterstates : Only considers destination states  Sarsa : Considers actions in current state

Policies GREEDY : takes best action ε-GREEDY : takes random action 5% of the time SOFTMAX : associates a probability of selecting an action proportional to predicted value Seek to balance exploration and exploitation Use optimistic reward and GREEDY throughout presentation

The agent’s memory Traditional reinforcement learning uses a tabular value function, which associates a value with every state

Tetris state space Since the Tetris well has dimensions twenty blocks deep by ten blocks wide, there are 200 block positions in the well that can be either occupied or empty. 2^200 states

Implications 2^200 values 2^200 vast beyond comprehension The agent would have to hold an educated opinion about each state, and remember it Agent would also have to explore each of these states repetitively in order to form an accurate opinion Pros : Familiar Cons : Storage, Exploration time, redundancy

Solution : Discard information Observe state space Draw Assumptions Adopt human optomisations Reduce game description

Human experience Look at top well (or in vicinity of top) Look at vertical strips

Assumption 1 The position of every block on screen is unimportant. We limit ourselves to merely considering the height of each column. 20^10 ≈ 2^43 states

Assumption 2 The importance lies in the relationship between successive columns, rather then their isolated heights. 20^9 ≈ 2^39 states

Assumption 3 Beyond a certain point, height differences between subsequent columns are indistinguishable. 7^9 ≈ 2^25 states

Assumption 4 At any point in placing the tetromino, the value of the placement can be considered in the context of a sub-well of width four. 7^3 = 343 states

Assumption 5 Since the game is stochastic, and the tetrominoes are uniformly selected from the tetromino set, the value of the well should be no different from its mirror image. 175 states

You promised us an untainted non- prejudice player but you just removed information it may have used constructively Collateral damage Results will tell

First Goal : Implement Tetris Implemented Tetris from first principles in java Tested game by including human input Bounds checking, rotations, translation Agent is playing an accurate version of Tetris Game played transparently by agent

My Tetris / Research platform

Second Goal : Attain learning Stan Melax successfully applied reinforcement learning to reduced form of Tetris

Melax Tetris description 6 blocks wide with infinite height Limited to tetrominoes Punished for increasing height above working height of 2 Throws away any information 2 blocks below working height Used standard tabular approach

Following paw prints Implemented agent according to Melax’s specification Afterstates  Considers value of destination state  Requires real time nudge to include reward associated with transition  This prevents agent from “chasing” good states

Results (Small = good)

Mirror symmetry

Discussion Learning evident Experimented with exploration methods, constants in learning algorithms Familiarised myself with implementing reinforcement learning

Third Goal : Introduce my representation Continued using reduced tetromino set Experimented with two distinct reinforcement approaches, afterstates and Sarsa(λ)

Afterstates Already introduced Uses 175 states

Sarsa(λ) Associates a value with every action in a state Requires no real-time nudging of values Uses eligibility traces which accelerate the rate of learning 100 times bigger state space then afterstates when using the reduced tetrominos State space : 175*100 = states Takes longer to train

Afterstates agent results(Big = good)

Sarsa agent results

Sarsa player at time of death

Final Step : Full Tetris Extending to Full Tetris Have an agent that is trained for sub-well

Approach Break the full game into overlapping sub- wells Collect transitions Adjust overlapping transitions to form single transition  Average of transitions  Biggest transition

Tiling

Sarsa results with reduced tetrominos

Afterstates results with reduced tetrominos

Sarsa results with full Tetris

In conclusion Thoroughly investigated reinforcement learning theory Achieved learning in 2 distinct reinforcement learning problems, Melax Tetis and my reduced Tetris Successfully implemented 2 different agents, afterstates and sarsa Successfully extended my sarsa agent to the full Tetris game, although professional Tetris players are in no danger of losing their jobs

Departing comments Thanks to Philip Sterne for prolonged patience Thanks to you for 20 minutes of patience