Reinforcement Learning Hien Van Nguyen University of Houston 2/4/2019 Slides adopted from [1] https://edge.edx.org/courses/course-v1:BerkeleyX+CS188x-SP16+SP16/20021a0a32d14a31b087db8d4bb582fd/ [2] http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
Deep Q-learning Deep Q-learning: You don’t know the transitions T(s,a,s’) You don’t know the rewards R(s,a,s’) You choose the actions now State space is large Goal: learn the optimal policy / values Idea: Represent Q-function by a deep network: 2/4/2019 Machine Learning
Deep Q-learning Represent Q-function by a deep network Define objective function by mean-squared error in Q-values: Take derivative: Target Train end-to-end via SGD Can use raw data to represent state 2/4/2019 Machine Learning
Policy gradient for continuous actions Challenge: Action space can be continuous and maximization of Q-function over action space is difficult. 2/4/2019 Machine Learning
Deterministic policy gradient 2/4/2019 Machine Learning
Deterministic actor-critic 2/4/2019 Machine Learning
Deterministic actor-critic learning rule 2/4/2019 Machine Learning
Stability issue with Deep RL 2/4/2019 Machine Learning
Strategies for improving stability 2/4/2019 Machine Learning
Experience replay 2/4/2019 Machine Learning
Fixed target Q-network 2/4/2019 Machine Learning
How much does DQN help? 2/4/2019 Machine Learning
Thank you for taking my class! 2/4/2019 Machine Learning