Neural Networks Chapter 7

Slides:



Advertisements
Similar presentations
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Advertisements

Markov Decision Process
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement Learning
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Reinforcement Learning
Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Chapter 6: Temporal Difference Learning
Reinforcement Learning (1)
1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Machine Learning Chapter 13. Reinforcement Learning
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Appendix B: An Example of Back-propagation algorithm
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Reinforcement Learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning
INTRODUCTION TO Machine Learning
Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
© Daniel S. Weld 1 Logistics No Reading First Tournament Wed Details TBA.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Chapter 6 Neural Network.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Reinforcement Learning
Mastering the game of Go with deep neural network and tree search
AlphaGo with Deep RL Alpha GO.
Reinforcement Learning
An Overview of Reinforcement Learning
Markov Decision Processes
Reinforcement Learning
"Playing Atari with deep reinforcement learning."
Reinforcement learning
Neural Networks Chapter 5
Dr. Unnikrishnan P.C. Professor, EEE
یادگیری تقویتی Reinforcement Learning
Chapter 1: Introduction
Hidden Markov Models (cont.) Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Neural Networks Chapter 7 Joost N. Kok Universiteit Leiden

Recurrent Networks Learning Time Sequences: Sequence Recognition Sequence Reproduction Temporal Association

Recurrent Networks Tapped Delay Lines: Keep several old values in a buffer

Recurrent Networks Drawbacks: Replace fixed time delays by filters: Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. Replace fixed time delays by filters:

Recurrent Networks Partially recurrent networks Output Nodes Hidden Nodes Input Nodes Context Nodes

Recurrent Networks Jordan Network

Recurrent Networks Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes

Recurrent Networks Expanded Hierarchical Elman Network Output Units Input Layer Hidden Layer Output Units Context Layer

Recurrent Networks

Recurrent Networks Back-Propagation Through Time

Reinforcement Learning Supervised learning with some feedback Reinforcement Learning Problems: Class I: reinforcement signal is always the same for given input-output pair Class II: stochastic environment, fixed probability for each input-output pair Class III: reinforcement and input patterns depend on past history of network output

Associative Reward-Penalty Stochastic Output Units Reinforcement Signal Target Error

Associative Reward Penalty Learning Rule

Models and Critics Environment

Reinforcement Comparison Critic Environment

Reinforcement Learning Reinforcement-Learning Model Agent receives input I which is some indication of current state s of environment Then the agent chooses an action a The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r

Reinforcement Learning Environment: You are in state 65. You have four possible actions. Agent: I’ll take action 2. Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. Agent: I’ll take action 1. Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. …

Reinforcement Learning Environment is non-deterministic: same action in same state may result in different states and different reinforcements The environment is stationary: Probabilities of making state transitions or receiving specific reinforcement signals do not change over time

Reinforcement Learning Two types of learning: Model-free learning Model based learning Typical application areas: Robots Mazes Games …

Reinforcement Learning Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)

Reinforcement Learning Environment is a Markov Decision Proces

Reinforcement Learning Optimize interaction with environment Optimize action selection mechanism Temporal Credit Assignment Problem Policy: action selection mechanism Value function:

Reinforcement Learning Optimal Value function based on optimal policy:

Reinforcement Learning Policy Evaluation: approximate value function for given policy Policy Iteration: start with arbitrary policy and improve

Reinforcement Learning Improve Policy:

Reinforcement Learning Value Iteration: combine policy evaluation and policy improvement steps:

Reinforcement Learning Monte Carlo: use if and are not known Given a policy, several complete iterations are performed Exploration/Exploitation Dilemma Extract Information Optimize Interaction

Reinforcement Learning Temporal Difference (TD) Learning During interaction, part of the update can be calculated Information from previous interactions is used

Reinforcement Learning TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update

Reinforcement Learning Q-learning: combine actor and critic:

Reinforcement Learning Use temporal difference learning

Reinforcement Learning Q(l) learning:

Reinforcement Learning Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).