The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke.

The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke

Goals Investigate result based learning Develop strategy for a highly random game Train network to play effectively without explicitly teaching the rules of the game

Strategy Simplify game to only allow for HIT or STAY Feedforward 3-layer backpropagation network – Give input units information about the hand and the dealer’s up card – 2 output units for HIT and STAY – 1 hidden layer Measure performance with Efficiency – Efficiency = (win % * 2) + (tie %) Return on a dollar

Background

To form a basis of comparison we measured efficiency on a player using: – Random Guessing Efficiency = 60.3% – Dealer’s Algorithm Hit when below 17, otherwise Stay Efficiency = 92.2%

PHASE I Input Specific Cards Showing

PHASE I – Network Setup 104 Input Units – 52 input units for possible cards in player’s hand – 52 input units for possible dealer’s up card 20 Hidden Units 2 Output Units – HIT and STAY Learning Rate = 0.3; Momentum = 0.3

PHASE I – Network Setup Target High = 0.9 Target Low = 0.1 Target Mid = 0.5 If hitting and staying yield same result – HIT = STAY = Target Mid If hitting produces a win while staying produces a loss – HIT = Target High – STAY = Target Low Vice versa

PHASE I – Results Efficiency peaks at about 88% but never settles

PHASE I – Modifications Tried multiple variations on initial network – Hidden units ranging from 1 to 20 – Learning rate and momentum adjustments Aging algorithm for learning rate – 20 Input Units 10 possible values for player’s cards 10 possible values for dealer’s up card No significant changes in performance

PHASE I - Analysis Network hits on a hand summing to 21 Analyzed why the network can’t improve, or even learn the dealer’s algorithm

PHASE II Input “best” sum of current hand

PHASE II – Strategy 4 types of inputs – No dealer card, no ace differentiation – No dealer card, with ace differentiation – Include dealer card, no ace differentiation – Include dealer card, with ace differentiation All use 2 output units and 4 hidden units

PHASE II – No dealer, no aces 18 input units – Represent all possible hand values when making a decision (ranging from 4 to 21) Results: – Develops the dealer’s algorithm Hits on sum < 17 Stays on sum > 16

PHASE II – No dealer, aces

PHASE II – Dealer, no aces 28 input units – 18 possible player hand values – 10 possible values for dealer’s up card Results: – High efficiency – Good at accounting for dealer’s card in boundary cases

PHASE II – Dealer, no aces

Network is more likely to stay when the dealer has a bust card

PHASE II – Dealer, aces 38 input units – 28 units for player’s hand 18 possible hard hand values 10 possible soft hand values – 10 units for the dealer’s up card Results: – Good at adjusting strategy for hard vs. soft hands

PHASE II – Dealer, aces Network always hits a soft 17 and stays on a hard 17

Conclusion Neural networks are not magical! Require the teacher to eliminate duplicate patterns – 5 of diamonds + 7 of clubs is equivalent to 8 of hearts + 4 of spades Result based training is inherently more difficult 2 hidden layers might help – We’re not optimistic!

The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke.

Similar presentations

Presentation on theme: "The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke.

Similar presentations

Presentation on theme: "The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke."— Presentation transcript:

Similar presentations

About project

Feedback