Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reinforcement Learning

Similar presentations


Presentation on theme: "Reinforcement Learning"— Presentation transcript:

1 Reinforcement Learning
Developing a self-learning snake game using Reinforcement Learning and pygame.

2 About me Student, Pursuing my Bachelor’s in Software Engineering
Freelance Software Developer A FOSS enthusiast, currently contributing to coala Pythonista, loves to develop automation projects, Machine Learning projects and occasionally write blogs regarding python. Github: Linkedin: Website: Blog:

3 Do you remember these?

4 Contents Quick Intro to Game Development : Common concepts
Designing the gameplay Events and control, Implementing game logic Some RL concepts: Agent, State, Reward, Policy, MDP and few more. Q-Learning to the Rescue Other Reinforcement Learning Techniques Self-Driving Car in action Current applications and Future Scopes in RL Available open source framework and libraries The code for the workshop is available at

5 Some Game Development concepts
Coordinates : The screen is a 2D grid plane with (0,0) in the top left Colors: RGB and alpha values Drawing: Plotting pixels, Surface Object, blitting Rendering: Animation, Frame/Refresh rate The game loop:

6 Designing the Gameplay
Objects : A snake, Apples, Walls Snake eats the apples, grows 1 unit longer. Snake dies when it hits the wall or runs over itself. Objective: Eat as many apples as possible without dying. What happens when the snake gets killed? How to start the game?

7 Code Implementation: Drawing, Displaying and Moving the game objects.

8 User Interaction & Game Logic
Arrow keys to move the head. Do we want our snake to keep moving. Detecting overlaps and collisions of snake head with other objects : boundaries, apples and its body. Scoring

9 Code Implementation: Adding the controls and the score to make a fully functional snake game.

10 Okay, let’s make our dumb computer control the snake.

11 Code Implementation: Wait, let’s add some intelligence to our agent. (Provide vision to the CPU i.e. game rules) Next Section: Or better, let’s make the CPU discover knowledge. (Make our snake learn from experiences)

12 Time to introduce Reinforcement Learning!

13 A few things to know State, History and Episode Action Reward
Policy, value function, and model Environment Agent Markov states and MDP Long story short : Everything that surrounds the agent in environment. A state represents the situation of the agent at a particular time in the environment. The agent performs an action to transition from one state to another and may receive a reward in return. The policy is the strategy of choosing an action given a state and the agent tries to chose a policy that optimizes the expected cumulative reward.

14 Implementation: Refactoring the game’s code

15 Q-learning to the rescue!
Popular, Simple, Model free RL technique (Environment’s model is not required) Can find optimal action-selection policy for any finite MDP. Learns the action-value function

16 Code Implementation: Using Q-learning to choose actions for the agent.

17 Our agent in action Note: Currently our rules don’t penalize snake for running over itself.

18 Possible Improvements to our agent
Optimizing the state space Adding time-based rewards Minimizing the exploration v/s exploitation tradeoff Optimizing the hyperparameters using techniques like Grid Search, Genetic Algorithms. Using state of the art RL techniques.

19 Other interesting techniques
SARSA: Uses Q-Learning as a part of policy iteration mechanism, next action is chosen randomly with predefined probability, faster than Q-learning when no. of actions are high. Deep Q-Networks: Combines usage of RL and Deep Neural Networks like CNN. Learns the non-linear value-action function through experience replay.

20 The self-driving car simulation design
State: Car on left, right, ahead? Traffic light green or red? Next waypoint (from GPS) Actions: Steer Left, Steer Right Accelerate, brake Rewards: Violating the traffic laws Hitting the obstacles Reaching the destination Time taken to reach destination (any thoughts on this?) Code Sample available at:

21 Applications of Reinforcement Learning
Playing games like chess (reward is not instantaneous, delayed feedback) Managing portfolio and finances (reward here is the money) Robotics (humanoid robots) Manufacturing and inventory management. General AI agents: Agents that can perform multiple things with single algorithm. Example, an agent playing all the Atari games.

22 Open source frameworks and libraries for RL
Open AI gym - A toolkit for developing and comparing reinforcement learning algorithms. Open AI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications Deepmind Lab - A customisable 3D platform for agent-based AI research

23 Some nice links Youtube lectures and tutorials:
UCL course on RL by D.Silver - Sentdex pygame tutorial - Python Code Samples: Reinforcement Learning, an introduction - Online Demo: ConvNetJS -

24


Download ppt "Reinforcement Learning"

Similar presentations


Ads by Google