Presentation is loading. Please wait.

Presentation is loading. Please wait.

Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.

Similar presentations


Presentation on theme: "Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98."— Presentation transcript:

1 Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98

2 Blackjack Against the dealer Dealer hits until 17 Limited Implementation

3 Reinforcement Learning Learning by Interaction Similar to MDP Delayed Reward Trial and Error Search Exploitation vs. Exploration Temporal Difference (TD)

4 Neural Network Basics Neurons fire when weighted value is above a threshold. Hidden layers of perceptrons.

5 Implementation SARSA –TD –Bootstrapping Mechanism –Updates Q(s,a) using quintuple (s,a,r,s’,a’) –Directly approximates optimal Q* Q-Learning –TD err = r +  max a’ Q(s’a’) – Q(s,a)

6 Implementation

7 Setup No Ace: {4, 5, 6,…, 20} Ace: {22, 23,…, 31} Terminals: 21 and -1 (bust)  -greedy policy

8 Learning Constants Reward r = -1 if loss, +1 if win, 0 after every hit Discount factor  = 0.9 Step size  = 0.01  = 0.01 Strategy Learned: –Only hit if score < 11 or ace held. –Very conservative –Double value of ace.

9 Fixed Strategies vs. Learned Over 1000 trials of 100 games each Thorp’s Strategy can approach 49% StrategyAvg(%)Max(%)Min(%) dealer’s40.75725 hold38.35124 random31.54618  Avg(%)Max(%)Min(%) 0.139.95426 0.0141.95626 0.1 * 0.99 r 40.95326

10 Other Experiments Probabilistic Strategies –Nonstationary tasks –r = -1 for loss, 0 otherwise –Achieved 49.14% win while learning Three Players –Second player had option to watch first. –Stayed Conservative –Very low win percentage

11 References A. Perez-Uribe and E. Sanchez, "Blackjack as a Test Bed for Learning Strategies in Neural Networks", Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'98 Frederic Meyer, “Java Blackjack and Reinforcement Learning”, http://lslwww.epfl.ch/~anperez/BlackJack/classes/RLJavaBJ.html http://lslwww.epfl.ch/~anperez/BlackJack/classes/RLJavaBJ.html Wikipedia –http://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neural_network –http://en.wikipedia.org/wiki/Blackjackhttp://en.wikipedia.org/wiki/Blackjack –http://en.wikipedia.org/wiki/Edward_O._Thorphttp://en.wikipedia.org/wiki/Edward_O._Thorp


Download ppt "Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98."

Similar presentations


Ads by Google