May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing1 A Reinforcement Learning Method Based on Adaptive Simulated Annealing.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Reinforcement Learning
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
RL for Large State Spaces: Value Function Approximation
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Supervised Learning Recap
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Planning under Uncertainty
Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Reinforcement Learning
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
Ensemble Learning: An Introduction
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Evaluating Hypotheses
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Introduction Many decision making problems in real life
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Reinforcement Learning
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Machine Learning Supervised Learning Classification and Regression
Optimization via Search
The Gradient Descent Algorithm
Reinforcement learning (Chapter 21)
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Neuro-Computing Lecture 4 Radial Basis Function Network
1 Department of Engineering, 2 Department of Mathematics,
Chapter 2: Evaluative Feedback
Reinforcement Learning
More on Search: A* and Optimization
Chapter 8: Generalization and Function Approximation
October 6, 2011 Dr. Itamar Arel College of Engineering
Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014
Greg Knowles ECE Fall 2004 Professor Yu Hu Hen
Chapter 2: Evaluative Feedback
Presentation transcript:

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing1 A Reinforcement Learning Method Based on Adaptive Simulated Annealing Authored by: Amir F. Atiya Department of Computer Engineering Cairo University, Giza, Egypt Alexander G. Parlos Dept. mechanical Engineering Texas A&M University, College Station, Texas Lester Ingber Lester Ingber Research Ingber.com September 13, 2003 Presented by Doug Moody, May 18, 2004

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing2 Glass-Blowing and its Impact on Reinforcement Learning Considering the whole piece while focusing on a particular section Slow cooling to relieve stress and gain consistency Use of “annealing”

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing3 Paper Approach Review the reinforcement learning problem, and introduce the use of function approximation to determine state values Briefly review the use of an adaptation of “annealing” algorithms to find functions that will determine a state’s value Use this approach on a straight forward decision-making problem.

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing4 Function Approximation Introduction Much of our emphasis in reinforcement learning has treated a value function as one entry for each state-action pair Finite Markov Decision processes have a fixed number of states and actions This approach can, in some problems, introduce limitations when there are many states, insufficient samples across all states or a continuous state space. These limitations can be addressed by “generalization” Generalization also can be referred to as “function approximation” Function approximation has been widely studied in many fields (think regression analysis!)

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing5 Function Approximation Characteristics A “batch” or “supervised learning” approach versus the on-line approach we have encountered Requires a “static” training set from which to learn Can not handle dynamically changing target functions, which may have been bootstrapped. Hence, function approximation is not suitable for all types of reinforcement learning

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing6 Function Approximation Goals Requires a “static” training set from which to learn Can not handle dynamically changing target functions, which may have been bootstrapped. Hence, function approximation is not suitable for all types of reinforcement learning The value function is dependent upon a parameter vector which could be the vector of connection in a network Typically function approximation wants to minimize: P(s) are weights of the errors MSE: Mean Squared Error : vector of function parameters

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing7 Function Approximation Methods Step by Step Approach : Gradient Descent - move slowly toward optimal “fit” Linear Approach: Special case of Gradient where parameters are a column vector Coding Methods –Coarse –Tile –Radial Basis Functions

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing8 COARSE CODING features should relate to the characteristics of the state For instance for a robot, the location, remaining power may be used For chess, the number of pieces, moves for pawn queen, etc.. Slide from Sutton and Barto textbook

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing9 LEARNING AND COARSE CODING Slide from Sutton and Barto textbook

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing10 TILE CODING Binary feature for each tile Number of features present at any one time is constant Binary features means weighted sum easy to compute Easy to compute indices of the features present Slide from Sutton and Barto textbook

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing11 RADIAL BASIS FUNCTIONS (GAUSSIAN) reflects degrees which feature is present Look to variance to show relationship of feature in the state space

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing12 Paper’s Description of the Reinforcement Learning Model Basic System Value Definition Policy Definition Optimal Policy Maximal Value Eq. 4 Eq. 5

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing13 Value Function to Optimize basis function weight parameter GOAL: find the optimal set of that will lead to the most accurate evaluation

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing14 Use Simulated Annealing to find best set of W k Annealing algorithms seek to search the entire state space and slowing “cool” to appropriate local minima Algorithms trade off between fast convergence and continuous sampling of the entire Used typically to find the optimization of a combinatorial problem Requirements: –Concise definition of the system –Random generator of moves –Objective function to be optimized –Temperature schedule

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing15 Example of Simulated Annealing Problem - find the lowest valley in a mountainous region View the problem as having two directions - North-South and East-West Use a bouncing ball to explore the terrain at high temperature The ball can make high bounces exploring many regions Each point in the terrain has a “cost function” to optimize As the temperature cools, the ball’s range and exploration decreases as it focuses on a smaller region of the terrain Two distributions are used: generating distribution (for each parameter), acceptance distribution Acceptance distribution determines whether to stay in the valley or bounce out. Both distributions are affected by temperature

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing16 Glass-Blowing Example Larger changes are made to the glass piece at higher temperatures As glass is cooled, the piece is still scanned (albeit more quickly) for stress points Can not be “heated” up again and keep previous results

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing17 Adaptive Simulated Annealing (ASA) Has some approach as “simulated annealing” Uses a specific distribution with a wider tail Does not rely on “quenching” to achieve quick convergence Has been available as a C programming system Relies heavily upon a large set of tuning options: –scaling of temperatures, probabilities –limitation on searching in regions with certain parameters –linear vs. non-linear vector Supports re-annealing - time is wound back ( and hence temperature) after some results are achieved to take example of found sensitivities Good for non-linear functions More information and software available at

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing18 Reinforcement learning with ASA Search

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing19 Sample Implementation Problem: Choose the highest number from a sequence of numbers –Numbers are generated from an unknown source, with a normal distribution having a mean between 0 and 1 and a standard deviation between 0 and.5 –As time passes the reward is discounted –Hence the tradeoff: more waiting provides more information, but a penalty is incurred Paper used 100 sources, with each generating 1000 numbers for a given sequence as the training set.

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing20 Solution Approach Define a state space as a combination of the following: –time t –the current mean at time t of observed numbers –the current standard deviation’ –the highest number chosen thus far Place 10 Gaussian basis functions throughout the State Space Use the algorithm to optimize a vector of weight parameters to the basis functions

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing21 RESULTS ASA achieved an overall reward value Q-Learning found the standard deviation Improvement is substantial given that picking the first number in each set would yield.5

May 18,2004A Reinforcement Learning Method Based on Adaptive Simlated Annealing22 Paper Comments Pros –Looked to use existing reinforcement taxonomies to discuss the problem –Selected a straight forward problem Negative –Did not fully describe the basis function placement –Insufficient parameter for Q-Learning used –Did not show an non-linear example –Could have provided more information on ASA Options used for results duplication