Backgammon project Oren Salzman Guy Levit Instructors:

Backgammon project Oren Salzman Guy Levit Instructors:
Part a: Ishai Menashe Part b: Yaki Engel

Agenda Project’s Objectives The Learning Algorithm
TDGammon Problematic points The Race Problem Experimental Results Future Development

Objectives Developing an agent that learns to play backgammon by playing with itself, using reinforcement learning techniques Inspired by Tesauro’s TDGammon version 0.0

Learning Algorithm - general
Evaluating positions using a neural network Greedy policy When the game ends the agent gets a reward according to the result (+2, +1, -1, -2)

TDGammon Problematic points
Non linear neural network Policy is changing during training Environment is changing during training Solutions: Linear network Learning in alternations

The Race Problem In race, a more algorithmic approach is required for choosing a move Three solutions were considered: Designing a manual algorithm Using a different Network for races Using the same Network, but each feature is dedicated either to a race or a non race position.

Experiments Various settings of parameters were checked :
Learning step (0.1, 0.3, 0.8) Lambda (0.1, 0.3, 0.5, 0.7, 0.9) Discount factor (0.95, 0.97, 0.98, 0.999) For each setting the agent played between half a million and five million games. All versions were compared to one golden version

Experiments’ results

Conclusions Learning step of 0.1 yielded the best results
High discount factor (0.98, 0.999) were better than lower ones. Lambda of 0.1 and 0.9 were inferior to others. Among 0.3, 0.5, and 0.7, 0.5 seemed the best. None of the versions outperformed the golden version

Future development More than 1-ply search Adding features
Going back to a non – linear network Letting both agents learn simultaneously Connecting the player to the internet Graphical User Interface

Learning Alogrithm - general
The agents plays against itself, and get rewards (-2, -1, +1, +2) when the game ends. The network weights are updated using the following formulas: The eligibility trace is updated by:

The Features

Backgammon Board Definitions

Backgammon project Oren Salzman Guy Levit Instructors:

Similar presentations

Presentation on theme: "Backgammon project Oren Salzman Guy Levit Instructors:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Backgammon project Oren Salzman Guy Levit Instructors:

Similar presentations

Presentation on theme: "Backgammon project Oren Salzman Guy Levit Instructors:"— Presentation transcript:

Similar presentations

About project

Feedback