Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.

Slides:



Advertisements
Similar presentations
Additional Topics ARTIFICIAL INTELLIGENCE
Advertisements

Solving problems by searching
Artificial Intelligence Presentation
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Markov Decision Process
RL for Large State Spaces: Value Function Approximation
Partially Observable Markov Decision Process (POMDP)
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
SVM—Support Vector Machines
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Applying reinforcement learning to Tetris Imp : Donald Carr Guru : Philip Sterne.
An Introduction to Markov Decision Processes Sarah Hickmott
Markov Decision Processes
Planning under Uncertainty
Evolution of Cooperative problem-solving in an artificial economy by E. Baum and I. Durdanovic presented by Quang Duong.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Reinforcement Learning (1)
Threshold Voltage Assignment to Supply Voltage Islands in Core- based System-on-a-Chip Designs Project Proposal: Gall Gotfried Steven Beigelmacher 02/09/05.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Solving problems by searching This Lecture Read Chapters 3.1 to 3.4 Next Lecture Read Chapter 3.5 to 3.7 (Please read lecture topic material before and.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Vinay Papudesi and Manfred Huber.  Staged skill learning involves:  To Begin:  “Skills” are innate reflexes and raw representation of the world. 
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Reinforcement Learning for the game of Tetris using Cross Entropy
Tactical Planning in Healthcare with Approximate Dynamic Programming Martijn Mes & Peter Hulshof Department of Industrial Engineering and Business Information.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.
1 CHAPTER 2 Decision Making, Systems, Modeling, and Support.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
Tetris Agent Optimization Using Harmony Search Algorithm
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Applying reinforcement learning to Tetris Researcher : Donald Carr Supervisor : Philip Sterne.
Blind Search Russell and Norvig: Chapter 3, Sections 3.4 – 3.6 CS121 – Winter 2003.
Biointelligence Lab School of Computer Sci. & Eng. Seoul National University Artificial Intelligence Chapter 8 Uninformed Search.
Artificial Intelligence Solving problems by searching.
Solving problems by searching
Particle Swarm Optimization (2)
Done Done Course Overview What is AI? What are the Major Challenges?
Analytics and OR DP- summary.
ECE 448 Lecture 4: Search Intro
Reinforcement learning (Chapter 21)
Russell and Norvig: Chapter 3, Sections 3.4 – 3.6
Planning to Maximize Reward: Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Eight-QueenS Problem: Fuzzy Constraint Satisfaction Problem Using Soar
INF 5860 Machine learning for image classification
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Reinforcement Learning
Reinforcement Learning
Designing Neural Network Architectures Using Reinforcement Learning
Reinforcement Learning Dealing with Partial Observability
Modeling and Analysis Tutorial
CS 416 Artificial Intelligence
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne

Reinforcement learning  Branch of AI  Characterised by a lack of direct interaction between programmer and artificial agent.  Agent is given access to simulated environment and develops its own tactics through trial and error.

Reinforcement learning  Characterised by : 4 components  Policy A mapping from state to action  Value function A description of long term reward  Reward function A numerical response to goal realisation/alienation  System model Internal representation of system

Intricacies  No initial assumptions on part of program  Many established weighting functions used to develop the value function. Encourage persistent learning, or converging to an optimal solution  Exploration vs. exploitation

Its all been half-done before  Yael Bdolah & Dror Livnat  S Melax

Dimensionality  “the curse of dimensionality“ – Richard Bellman  Using a binary description of the blocks, each additional block doubles memory requirements  Exponential complexity

Consequence  Successfully applying reinforcement learning to hobbled Tetris

Redefine your enemy  Resting environment is tiny 2 by 8 blocks =2^16 possible states  Blocks fall from an infinite height There is infinite time for decision Placement options do not decrease as time progresses Goals remain constant over time  Linear risk vs. reward response

Reality in contrast

The human lot  Environment is massive 13*20 blocks = 2^260 possible states  The are very real time constraints with the number of options decreasing as block descends  Successfully completing 4 rows carries 16 times the reward of completing 1 row, but also carries much higher risk  Logical tactics change as finite stage fills up. e.g. Don’t risk 4 row completion with 2 empty rows remaining

No hand : Just boot or sweetie  No explicit tactics yielded to computer (digital virgin)  Given sensory perception via our description of the system  Given ability to rotate and manoeuvre Tetris piece  Receives external reward or punishment we associate with state transitions  Given long term memory

School of hard knocks Iterative training Agent goes from completely ignorant entity to veritable veteran in iterative process  Rate of learning  Depth of learning  Flexibility of learning Balance between common parameters

Refocus  Focus of project is on minimising state space Implementing Tetris specific solutions  mirror Symmetry : sqrt of state space  Focusing on restricted section of formation e.g. top 4 rows of formation  Considering several substates Researching and implementing general optimisations Possibly utilising other numeric methods to find best possibility in state space (standard description involves linear iterative search for alternative with maximum value)

Strategic planning Toying with methods of representation - ongoing Code / Hijack Tetris Basic learning Increasing complexity of system Increasing complexity of agent Noting shortcomings and countering flaws Looking for generality in optimisations Look for direct application to external problems Look for similarities in external problems

Fuzzy outline 4 weeks : Research period 1 week : Code Tetris and select structures 3 weeks : Achieve basic learning with agent 5 weeks : Optimisation of state space 3 weeks : Testing

Possible outcomes  Optimisations capable of extending reinforcement learning to problems previously considered outside of its sphere of application  Unbiased flexibility of reinforcement learning applied to a problem it is ideal for  A possible contender for the Tetris world record (algorithmic)