1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Slides from: Doug Gray, David Poole
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
RL for Large State Spaces: Value Function Approximation
Training and Testing Neural Networks 서울대학교 산업공학과 생산정보시스템연구실 이상진.
Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.
Instance Based Learning
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
x – independent variable (input)
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Radial Basis Functions
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
CS Instance Based Learning1 Instance Based Learning.
Radial Basis Function (RBF) Networks
Radial Basis Function Networks
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2006.
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Overcoming the Curse of Dimensionality with Reinforcement Learning Rich Sutton AT&T Labs with thanks to Doina Precup, Peter Stone, Satinder Singh, David.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Non-Bayes classifiers. Linear discriminants, neural networks.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Decision Making Under Uncertainty Lec #9: Approximate Value Function UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Data Transformation: Normalization
Adavanced Numerical Computation 2008, AM NDHU
Mastering the game of Go with deep neural network and tree search
A Simple Artificial Neuron
Collaborative Filtering Matrix Factorization Approach
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Neuro-Computing Lecture 4 Radial Basis Function Network
Chapter 4: Dynamic Programming
Capabilities of Threshold Neurons
Chapter 8: Generalization and Function Approximation
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Computer Vision Lecture 19: Object Recognition III
Chapter 4: Dynamic Programming
Prediction Networks Prediction A simple example (section 3.7.3)
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012

2 Objetivo desta Aula n Aprendizado por Reforço: –Traços de Elegibilidade. –Generalização e Aproximações de funções. n Aula de hoje: capítulos 7 e 8 do Sutton & Barto.

3 Generalization and Function Approximation Capítulo 8 do Sutton e Barto.

4 Objetivos n Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. n Overview of function approximation (FA) methods and how they can be adapted to RL n Não tão profundamente como no livro (comentário do Bianchi)

5 Value Prediction with Function Approximation As usual: Policy Evaluation (the prediction problem): for a given policy , compute the state-value function In earlier chapters, value functions were stored in lookup tables.

6 Adapt Supervised Learning Algorithms Supervised Learning System Inputs Outputs Training Info = desired (target) outputs Error = (target output – actual output) Training example = {input, target output}

7 Backups as Training Examples As a training example: inputtarget output

8 Any FA Method? n In principle, yes: –artificial neural networks –decision trees –multivariate regression methods –etc. n But RL has some special requirements: –usually want to learn while interacting –ability to handle nonstationarity –other?

9 Gradient Descent Methods transpose

10 Performance Measures n A common and simple one is the mean- squared error (MSE) over a distribution P : n Let us assume that P is always the distribution of states at which backups are done. n The on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution.

11 Gradient Descent Iteratively move down the gradient:

12 Gradient Descent Cont. For the MSE given above and using the chain rule:

13 Gradient Descent Cont. Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if  decreases appropriately with t.

14 But We Don’t have these Targets

15 What about TD( ) Targets?

16 On-Line Gradient-Descent TD( )

17 Linear Methods

18 Nice Properties of Linear FA Methods n The gradient is very simple: n For MSE, the error surface is simple: quadratic surface with a single minumum. Linear gradient descent TD( ) converges: –Step size decreases appropriately –On-line sampling (states sampled from the on- policy distribution) –Converges to parameter vector with property:

19 Linear methods mais usados n Coarse Coding n Tile Coding (CMAC) n Radial Basis Functions n Kanerva Coding

20 Coarse Coding Generalization from state X to state Y depends on the number of their features whose receptive fields

21 Coarse Coding Generalization in linear function approximation methods is determined by the sizes and shapes of the features' receptive fields. All three of these cases have roughly the same number and density of features.

22 Coarse Coding

23 Learning and Coarse Coding Example of feature width's strong effect on initial generalization (first row) and weak effect on accuracy

24 Tile Coding n Binary feature for each tile n Number of features present at any one time is constant n Binary features means weighted sum easy to compute n Easy to compute indices of the freatures present

25 Tile Coding

26 Exemplo: Simulated Soccer n How does agent decide what to do with the ball? n Complexities –Continuous inputs –High dimensionality n Do Artigo: Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

27 Problems n State space explodes exponentially in terms of dimensionality n Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer… Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

28 Quantization n Divide State Space into regions of interest –Tile Coding (Sutton & Barto, 1998) n No automated method for regions –granularity –Heterogeneity –location n Prefer a learned abstraction of state space Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

29 Kohonen Networks n Clustering algorithm n Data driven Agent near opponent goal Teammate near opponent goal No nearby opponents Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

30 State Space Reduction n 90 continuous valued inputs describe state of a soccer game –Naïve discretization  2 90 states –Filter out unnecessary inputs  still 2 18 states –Clustering algorithm  only 5000 states Big Win!!! Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

31 Two Pass Algorithm n Pass 1: –Use Kohonen Network and large training set to learn state space n Pass 2: –Use Reinforcement Learning to learn utilities for states (SARSA) Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

32 Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line? Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

33 Results n Evaluate three systems –Control – Random action selection –SARSA –Forcing Function n Evaluation criteria –Goals scored –Time of possession Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

34 Cumulative Score Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

35 Team with Forcing Functions Reinforcement Learning in Simulated Soccer with Kohonen Networks, de Chris White and David Brogan (University of Virginia)

36 Can you beat the “curse of dimensionality”? n Can you keep the number of features from going up exponentially with the dimension? n “Lazy learning” schemes: –Remember all the data –To get new value, find nearest neighbors and interpolate –e.g., locally-weighted regression

37 Can you beat the “curse of dimensionality”? n Function complexity, not dimensionality, is the problem. n Kanerva coding: –Select a bunch of binary prototypes –Use hamming distance as distance measure –Dimensionality is no longer a problem, only complexity

38 Algorithms using Function Approximators n We now extend value prediction methods using function approximation to control methods, following the pattern of GPI. n First we extend the state-value prediction methods to action-value prediction methods, then we combine them with policy improvement and action selection techniques. n As usual, the problem of ensuring exploration is solved by pursuing either an on-policy or an off-policy approach.

39 Control with FA n Learning state-action values: n The general gradient-descent rule: Training examples of the form:

40 Control with FA Gradient-descent Sarsa( ) (backward view):

41 Linear Gradient Descent Sarsa(l)

42 Linear Gradient Descent Q( )

43 Mountain-Car Task

44 Mountain-Car Results The effect of alpha, lambda and the kind of traces on early performance on the mountain-car task. This study used five 9 x 9 tilings.

45 Summary n Generalization n Adapting supervised-learning function approximation methods n Gradient-descent methods n Linear gradient-descent methods –Radial basis functions –Tile coding –Kanerva coding

46 Summary n Nonlinear gradient-descent methods? Backpropation? n Subleties involving function approximation, bootstrapping and the on-policy/off-policy distinction

Conclusion 47

Conclusão n Vimos dois métodos importantes na aula de hoje: –Traços de elegibilidade, que faz uma generalização temporal do aprendizado. –Aproximadores de função, que generalizam a função valor aprendida. n Generalizam o aprendizado. 48

49 Fim.