Reinforcement Learning and Tetris Jared Christen
Tetris Markov decision processes Large state space Long-term strategy without long-term knowledge
Background Hand-coded algorithms can clear > 1,000,000 lines Genetic algorithm by Roger Llima averages 42,000 lines Reinforcement learning algorithm by Kurt Driessens averages lines
Goals Develop a Tetris agent that improves on previous reinforcement learning implementations Secondary goals Use as few handpicked features as possible Encourage risk-taking Include rarely-studied features of Tetris
Approach
Neural Net Control Inputs Raw state – filled & empty blocks Handpicked features Outputs Movements Placements
Contour Matching
Structure Active tetromino Next tetromino Held tetromino Placement 1 score Placement 1 match length Placement 1 value Placement n match length Placement n value Placement n score Hold value
Experiments 200 learning games Averaged over 30 runs Two-piece and six-piece configurations Compare to benchmark contour matching agent
Results Two-pieceSix-piece
Results Scor e Lines Cleared Best match Two-piece Six-piece Six-piece with height differences Six-piece with placement heights
Conclusions Accidentally developed a heuristic that beats previous reinforcement learning techniques Six-piece’s outperformance of two- piece suggests there is some pseudo- planning going on A better way to generalize the board state may be necessary