Download presentation
Presentation is loading. Please wait.
Published byMartin McCarthy Modified over 6 years ago
2
Backgammon project Oren Salzman Guy Levit Instructors:
Part a: Ishai Menashe Part b: Yaki Engel
3
Agenda Project’s Objectives The Learning Algorithm
TDGammon Problematic points The Race Problem Experimental Results Future Development
4
Objectives Developing an agent that learns to play backgammon by playing with itself, using reinforcement learning techniques Inspired by Tesauro’s TDGammon version 0.0
5
Learning Algorithm - general
Evaluating positions using a neural network Greedy policy When the game ends the agent gets a reward according to the result (+2, +1, -1, -2)
6
TDGammon Problematic points
Non linear neural network Policy is changing during training Environment is changing during training Solutions: Linear network Learning in alternations
7
The Race Problem In race, a more algorithmic approach is required for choosing a move Three solutions were considered: Designing a manual algorithm Using a different Network for races Using the same Network, but each feature is dedicated either to a race or a non race position.
8
Experiments Various settings of parameters were checked :
Learning step (0.1, 0.3, 0.8) Lambda (0.1, 0.3, 0.5, 0.7, 0.9) Discount factor (0.95, 0.97, 0.98, 0.999) For each setting the agent played between half a million and five million games. All versions were compared to one golden version
9
Experiments’ results
10
Experiments’ results
11
Conclusions Learning step of 0.1 yielded the best results
High discount factor (0.98, 0.999) were better than lower ones. Lambda of 0.1 and 0.9 were inferior to others. Among 0.3, 0.5, and 0.7, 0.5 seemed the best. None of the versions outperformed the golden version
12
Future development More than 1-ply search Adding features
Going back to a non – linear network Letting both agents learn simultaneously Connecting the player to the internet Graphical User Interface
13
END
14
Learning Alogrithm - general
The agents plays against itself, and get rewards (-2, -1, +1, +2) when the game ends. The network weights are updated using the following formulas: The eligibility trace is updated by:
15
The Features
16
Backgammon Board Definitions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.