Human level control through deep reinforcement learning Naiyan Wang
P 1 art Q Learning
Q Learning S A R tate ction eward
Q Learning Learning Rate Discount Factor New State Old State Reward
P 2 art Deep Q Learning
Traditional Cooking
Traditional Cooking
Traditional Cooking
Traditional Cooking
Traditional Cooking
End to End Cooking
End to End Learning
Formulation 1 2 3 Target Variable
Results Analysis DQN is good at … DQN is bad at …
P 3 art Discussion
Discussion Q: What is the key contributing factor? A: Almost unlimited training data Q: How to account for long term dependency ? A: Long short term memory may be the solution
Thank You