Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Slides:



Advertisements
Similar presentations
1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.
Advertisements

Reinforcement Learning
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Reinforcement Learning
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Reinforcement Learning
Chapter 1: Introduction
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning (1)
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Lyle Ungar, University of Pennsylvania Learning and Memory Reinforcement Learning.
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Introduction Many decision making problems in real life
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement Learning
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Reinforcement Learning RS Sutton and AG Barto Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Machine Learning for Computer Security
Stochastic tree search and stochastic games
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement learning (Chapter 21)
Reinforcement Learning
Reinforcement Learning
AV Autonomous Vehicles.
Announcements Homework 3 due today (grace period through Friday)
Dr. Unnikrishnan P.C. Professor, EEE
Reinforcement Learning for Adaptive Game Learner
Chapter 1: Introduction
Instructor: Vincent Conitzer
Reinforcement Learning
Morteza Kheirkhah University College London
Presentation transcript:

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Reinforcement Learning Resources Support Resources: Introduction for 5 minutes. See first mins of this one: Longer video sequence (home work) To Read:

Reinforcement Learning Reinforcement learning is defined characterizing a learning problem and not by characterizing learning methods. Reinforcement learning differs from supervised learning, the kind of learning studied in most current research e.g. machine learning, statistical pattern recognition, and artificial neural networks

Definition of Terms Policy, Reward Function, Value function Model of the environment.(optionally)

Policy A policy defines the learning agent's way of behaving at a given time. It is a mapping from perceived states of the environment to actions to be taken when in those states. It corresponds to what in psychology would be called a set of stimulus-response rules or associations. The policy is the core of a any reinforcement learning agent.

Rewards Function A reward function defines the goal in a reinforcement learning problem. It maps each perceived state (or state-action pair) of the environment to a single number, a reward, indicating the intrinsic desirability of that state. A reinforcement learning agent's sole objective is to maximize the accumulated reward over a period of time.

Value Function The reward accumulated over a period of time is known as the value function Whereas a reward function indicates what is good in an immediate sense, a value function specifies what is good in the long run. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states. Rewards are in a sense primary, whereas values, as predictions of rewards, are secondary. Without rewards there could be no values, and the only purpose of estimating values is to achieve more reward.

Model of the Environment. This represents and mimics the behaviour of the environment. Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. Models are optional

Reinforcement Learning General Idea Represent the world as an agent interacting with the environment: Conduct “Trial and Error” or “Sampling” experiments in order to solve some goal. Sense a reward (or negative reinforcement) as a result of some behaviour that moves towards the goal. Add more weight to using that behaviour (or less weight) and continue trials. s SENSE EFFECT Environment

Reinforcement Learning RL – the idea is pervasive ACTION REWARD

Reinforcement Learning Output of RL The goal of RL is to learn a mapping SITUATION => ACTION which optimises the rewards obtained. Note connection with AI Planning: Situation = Goal, State, Actions Input to MetricFF, output solution Action is head(solution). Assuming solution is optimal, this is the best action to take. RL “comes into its own” when the condition where we can use a planner are not met e.g. partial observable state, actions not well specified

Challenges of RL One of the challenges that arise in reinforcement learning is the trade-off between exploration and exploitation

Tic-Tac-Toe

Although might look like a simple problem, but cannot readily be solved in a satisfactory way through classical techniques. For example, the classical "minimax" solution from game theory is not accurate in this case because it assumes a particular way of playing by the opponent.

Tic-Tac-Toe This example has a relatively small, finite state set, whereas reinforcement learning can be used when the state set is very large, or even infinite. For example, Gerry Tesauro (1992, 1995) combined the algorithm described above with an artificial neural network to learn to play backgammon, which has approximately states With this many states it is impossible ever to experience more than a small fraction of them.

Summary Reinforcement learning uses a formal framework in terms of states, actions, and rewards The concepts of maximising value and value functions are the key features of the reinforcement learning methods. Reinforcement learning is a computational approach to understanding and automating goal- directed learning and decision-making.