Reinforcement Learning and Tetris Jared Christen.

Reinforcement Learning and Tetris Jared Christen

Tetris Markov decision processes Large state space Long-term strategy without long-term knowledge

Background Hand-coded algorithms can clear > 1,000,000 lines Genetic algorithm by Roger Llima averages 42,000 lines Reinforcement learning algorithm by Kurt Driessens averages 30-40 lines

Goals Develop a Tetris agent that improves on previous reinforcement learning implementations Secondary goals Use as few handpicked features as possible Encourage risk-taking Include rarely-studied features of Tetris

Approach

Neural Net Control Inputs Raw state – filled & empty blocks Handpicked features Outputs Movements Placements

Contour Matching

Structure Active tetromino Next tetromino Held tetromino Placement 1 score Placement 1 match length Placement 1 value Placement n match length Placement n value Placement n score Hold value

Experiments 200 learning games Averaged over 30 runs Two-piece and six-piece configurations Compare to benchmark contour matching agent

Results Two-pieceSix-piece

Results Scor e Lines Cleared Best match 724255 Two-piece 599645 Six-piece 708553 Six-piece with height differences 714453 Six-piece with placement heights 698152

Conclusions Accidentally developed a heuristic that beats previous reinforcement learning techniques Six-piece’s outperformance of two- piece suggests there is some pseudo- planning going on A better way to generalize the board state may be necessary

Reinforcement Learning and Tetris Jared Christen.

Similar presentations

Presentation on theme: "Reinforcement Learning and Tetris Jared Christen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning and Tetris Jared Christen.

Similar presentations

Presentation on theme: "Reinforcement Learning and Tetris Jared Christen."— Presentation transcript:

Similar presentations

About project

Feedback