Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.

Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98

Blackjack Against the dealer Dealer hits until 17 Limited Implementation

Reinforcement Learning Learning by Interaction Similar to MDP Delayed Reward Trial and Error Search Exploitation vs. Exploration Temporal Difference (TD)

Neural Network Basics Neurons fire when weighted value is above a threshold. Hidden layers of perceptrons.

Implementation SARSA –TD –Bootstrapping Mechanism –Updates Q(s,a) using quintuple (s,a,r,s’,a’) –Directly approximates optimal Q* Q-Learning –TD err = r +  max a’ Q(s’a’) – Q(s,a)

Implementation

Setup No Ace: {4, 5, 6,…, 20} Ace: {22, 23,…, 31} Terminals: 21 and -1 (bust)  -greedy policy

Learning Constants Reward r = -1 if loss, +1 if win, 0 after every hit Discount factor  = 0.9 Step size  = 0.01  = 0.01 Strategy Learned: –Only hit if score < 11 or ace held. –Very conservative –Double value of ace.

Fixed Strategies vs. Learned Over 1000 trials of 100 games each Thorp’s Strategy can approach 49% StrategyAvg(%)Max(%)Min(%) dealer’s40.75725 hold38.35124 random31.54618  Avg(%)Max(%)Min(%) 0.139.95426 0.0141.95626 0.1 * 0.99 r 40.95326

Other Experiments Probabilistic Strategies –Nonstationary tasks –r = -1 for loss, 0 otherwise –Achieved 49.14% win while learning Three Players –Second player had option to watch first. –Stayed Conservative –Very low win percentage

References A. Perez-Uribe and E. Sanchez, "Blackjack as a Test Bed for Learning Strategies in Neural Networks", Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'98 Frederic Meyer, “Java Blackjack and Reinforcement Learning”, http://lslwww.epfl.ch/~anperez/BlackJack/classes/RLJavaBJ.html http://lslwww.epfl.ch/~anperez/BlackJack/classes/RLJavaBJ.html Wikipedia –http://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neural_network –http://en.wikipedia.org/wiki/Blackjackhttp://en.wikipedia.org/wiki/Blackjack –http://en.wikipedia.org/wiki/Edward_O._Thorphttp://en.wikipedia.org/wiki/Edward_O._Thorp

Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.

Similar presentations

Presentation on theme: "Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.

Similar presentations

Presentation on theme: "Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98."— Presentation transcript:

Similar presentations

About project

Feedback