Reinforcement Learning

Slides:

Advertisements

Similar presentations

Lirong Xia Reinforcement Learning (1) Tue, March 18, 2014.

Advertisements

Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded.

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Markov Decision Process

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

RL for Large State Spaces: Value Function Approximation

Partially Observable Markov Decision Process (POMDP)

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Markov Decision Processes

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning (1)

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $

Reinforcement Learning

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Reinforcement Learning

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning

INTRODUCTION TO Machine Learning

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Reinforcement Learning

Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning (RL)

Reinforcement learning

Reinforcement Learning

Reinforcement Learning (1)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement learning (Chapter 21)

Reinforcement Learning

Markov Decision Processes

"Playing Atari with deep reinforcement learning."

Announcements Homework 3 due today (grace period through Friday)

Reinforcement learning

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Dr. Unnikrishnan P.C. Professor, EEE

Reinforcement Learning in MDPs by Lease-Square Policy Iteration

یادگیری تقویتی Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

CS 188: Artificial Intelligence Spring 2006

CS 188: Artificial Intelligence Fall 2008

Designing Neural Network Architectures Using Reinforcement Learning

Reinforcement learning

Reinforcement Nisheeth 18th January 2019.

Reinforcement Learning (2)

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Markov Decision Processes

Reinforcement Learning (2)

Presentation transcript:

Reinforcement Learning Srinidhi Krishnamurthy

Basics

Reinforcement Learning RL = feedback based system Epsilon - ε Every RL System has states, actions, and rewards

MDPs Markov Decision Process Markov Assumption Similar to Markov Chains Agent Environment Actions Reward Policy Episode Markov Assumption the probability of the next state si+1 depends only on current state si and performed action ai but not on preceding states or actions.

Discounted Future Reward Total Reward for One Episode The Discount Factor γ (gamma) Good ideas for gamma γ = 0 for only immediate awards γ = 0.9 for good balance γ = 1 for deterministic problems

Q-Learning Q-Value Network Pseudocode Q-Value Matrix

Proof of Q-Learning’s Convergence http://users.isr.ist.utl.pt/~mtjspaan/readingGroup/ProofQlearning.pdf

SARSA State-Action-Reward-State-Action starts in state 1 performs action 1 gets a reward (reward 1) Now, it’s in state 2 performs another action (action 2) gets the reward from this state (reward 2) before it goes back and updates the value of action 1 performed in state 1.

Q-learning vs SARSA Q-learning SARSA https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/

Experience Replay approximation of Q-values using non-linear functions is not very stable During gameplay all the experiences <s,a,r,s′> are stored in a replay memory. When training the network, random samples from the replay memory are used instead of the most recent transition makes the training task more similar to supervised learning

Shortcoming and Solutions credit assignment problem it propagates rewards back in time Exploration vs Exploitation ε choose a random action decreases ε over time from 1 to 0.1 So we continually get down to a fixed exploration space

Flappy Bird RL https://sarvagyavaish.github.io/FlappyBirdRL/

Project/Problem Implement Q-Learning in robot world such that it takes the optimal path every time. It should not take that long for the bot to learn what is right and what is wrong. Also, I know that the solution can be found online so don’t cheat. Have fun! You can probably get a solid amount of candy if you get it right.