Download presentation
Presentation is loading. Please wait.
Published byAdrian Powers Modified over 9 years ago
1
The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke
2
Goals Investigate result based learning Develop strategy for a highly random game Train network to play effectively without explicitly teaching the rules of the game
3
Strategy Simplify game to only allow for HIT or STAY Feedforward 3-layer backpropagation network – Give input units information about the hand and the dealer’s up card – 2 output units for HIT and STAY – 1 hidden layer Measure performance with Efficiency – Efficiency = (win % * 2) + (tie %) Return on a dollar
4
Background
5
To form a basis of comparison we measured efficiency on a player using: – Random Guessing Efficiency = 60.3% – Dealer’s Algorithm Hit when below 17, otherwise Stay Efficiency = 92.2%
6
PHASE I Input Specific Cards Showing
7
PHASE I – Network Setup 104 Input Units – 52 input units for possible cards in player’s hand – 52 input units for possible dealer’s up card 20 Hidden Units 2 Output Units – HIT and STAY Learning Rate = 0.3; Momentum = 0.3
8
PHASE I – Network Setup Target High = 0.9 Target Low = 0.1 Target Mid = 0.5 If hitting and staying yield same result – HIT = STAY = Target Mid If hitting produces a win while staying produces a loss – HIT = Target High – STAY = Target Low Vice versa
9
PHASE I – Results Efficiency peaks at about 88% but never settles
10
PHASE I – Modifications Tried multiple variations on initial network – Hidden units ranging from 1 to 20 – Learning rate and momentum adjustments Aging algorithm for learning rate – 20 Input Units 10 possible values for player’s cards 10 possible values for dealer’s up card No significant changes in performance
11
PHASE I - Analysis Network hits on a hand summing to 21 Analyzed why the network can’t improve, or even learn the dealer’s algorithm
12
PHASE II Input “best” sum of current hand
13
PHASE II – Strategy 4 types of inputs – No dealer card, no ace differentiation – No dealer card, with ace differentiation – Include dealer card, no ace differentiation – Include dealer card, with ace differentiation All use 2 output units and 4 hidden units
14
PHASE II – No dealer, no aces 18 input units – Represent all possible hand values when making a decision (ranging from 4 to 21) Results: – Develops the dealer’s algorithm Hits on sum < 17 Stays on sum > 16
15
PHASE II – No dealer, aces
16
PHASE II – Dealer, no aces 28 input units – 18 possible player hand values – 10 possible values for dealer’s up card Results: – High efficiency – Good at accounting for dealer’s card in boundary cases
17
PHASE II – Dealer, no aces
19
Network is more likely to stay when the dealer has a bust card
20
PHASE II – Dealer, aces 38 input units – 28 units for player’s hand 18 possible hard hand values 10 possible soft hand values – 10 units for the dealer’s up card Results: – Good at adjusting strategy for hard vs. soft hands
21
PHASE II – Dealer, aces Network always hits a soft 17 and stays on a hard 17
22
Conclusion Neural networks are not magical! Require the teacher to eliminate duplicate patterns – 5 of diamonds + 7 of clubs is equivalent to 8 of hearts + 4 of spades Result based training is inherently more difficult 2 hidden layers might help – We’re not optimistic!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.