Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
CS6800 Advanced Theory of Computation
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Parameter control Chapter 8. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Parameter Control in EAs 2 Motivation 1 An EA has many.
1 Evolutionary Computational Inteliigence Lecture 6b: Towards Parameter Control Ferrante Neri University of Jyväskylä.
Parameter Control A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 8.
Reinforcement Learning
Reinforcement learning (Chapter 21)
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
1 Wendy Williams Metaheuristic Algorithms Genetic Algorithms: A Tutorial “Genetic Algorithms are good at taking large, potentially huge search spaces and.
Reinforcement learning
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Reinforcement Learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Genetic Algorithms and Their Applications John Paxton Montana State University August 14, 2003.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Genetic Algorithms: A Tutorial
Genetic Algorithm.
Reinforcement Learning
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
More on coevolution and learning Jing Xiao April, 2008.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
/ 26 Evolutionary Computing Chapter 8. / 26 Chapter 8: Parameter Control Motivation Parameter setting –Tuning –Control Examples Where to apply parameter.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Pac-Man AI using GA. Why Machine Learning in Video Games? Better player experience Agents can adapt to player Increased variety of agent behaviors Ever-changing.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
Reinforcement learning (Chapter 21)
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Genetic Algorithms. Solution Search in Problem Space.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Machine Learning overview Chapter 18, 21
Reinforcement Learning
Announcements Homework 3 due today (grace period through Friday)
Genetic Algorithms: A Tutorial
EE368 Soft Computing Genetic Algorithms.
Boltzmann Machine (BM) (§6.4)
Traveling Salesman Problem by Genetic Algorithm
Evolutionary Ensembles with Negative Correlation Learning
Modeling IDS using hybrid intelligent systems
Genetic Algorithms: A Tutorial
Presentation transcript:

Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ

Goal Two main branches of reinforcement learning: –Search the space of functions that asses values to utility of states. –Search the space of functions that asses values to behaviors (Q-learning). Describe the evolutionary algorithm approach to reinforcement learning and compare and contrast with more common TD methods

Sequential Decision Task This is an example of sequential decision task, the optimal sequence For an agent that starts from a1 is R,D,R,D,D,R,R,D

Learning from reinforcements The EA and TD methods have similarities: –Solve diffucult sequential decision tasks. –These methods are normally model-free, whereas dynamic programming approaches learn from a complete mathematical model of the underlying system.

Supervised vs. Unsupervised Learning In supervised learning the agent receives the correct decisions paired with specific sensory input. In reinforcement learning, the agent does not learn from examples of correct behavior but uses system payoffs as aguide to form effective decision policies. –A decision making may not receive feedback after every decision. –They are applicable in problems where significant domain knowledge is either unavailable or costly to obtain.

TD Reinforcement Learning Q-learning update scheme: Q(x,a)  Q(x,a)+  (R(i)+maxQ(y,b)-Q(x,a)) After a long-run: Q-values will converge to the optimal values. A reinforcement learning system can thus use the Q values to evaluate each action that is possible from a given state.

Evolutionary Algorithm Evolutionary algorithms are global search techniques. They are built on Darwin’s theory of evolution by natural selection. Numerous potential solutions are encoded in structures, called chromosomes. During each iteration, the EA evaluates solutions adn generates offspring based on the fitness of each solution in the task. Substructures, or genes, of the solutions are then modified through genetic operators such as mutation or recombination. The idea: structures that led to good solutions in previous evaluations can be mutated or combined to form even better solutions.

Basic Steps in Evolutionary Algorithm Procedure EA begin t=0; initialize P(t); evaluate structures in P(t) while termination condition not satisfied do begin t=t+1; select P(t) form P(t-1); alter structures in P(t); evaluate structures in P(t); end end.

Similarities EA and RL EA method can learn “online” through the direct interaction with the underlying system They are adaptive to changes. Online learning is often time-consuming and dangerous. Teherfore researchers in EA and TD learning train in simulation then apply the learned control policy to real system.

Differences between TD and EA approach This can be summarized along three dimensions: –Policy representation –Credit assignment –Memory

Policy Representation What they represent: –TD methods form q(x,a)  v. –EA methods uses direct mapping from state to recommended action (e.G. Q(x)  a). Conclusion: TD try to solve a more diffucult problem than reinforcement posed. In addition to choosing best action, it tells us why.

The size of hypothesis space The hypothesis space for direct policy representation (e.g. Q(x)  a) –The total number of functions: c n where n: number of states & c: number of actions The hypothesis space for a value function representation (e.g. Q(x,a)  v) –The total number of functions: w cn where w: the number of possible values NOTE: the size of hypothesis space does not reflect the difficulty of the problem northe efficiency of the methods that search the space.

Credit Assignment TD reinforcement learning: Credit is chained backward. In this manner, payoffs are distributed across sequence of actions. A single reward value becomes associated with each individual state and decision pair. EA reinforcement learning: rewards are only associated with sequences of decisions. Thus, which individual decisions are most responsible for a good or poor decision policy is irrelevant to the EA.

Issue in Credit Assignment In TD, the update is focused on a single state and action. In EA, after a recombination the evaluation is on the entire sequence of actions.

Memory TD reinforcement learning: maintain statistics concerning every state-action pairs. This method sustain information about both good and bad state-action pairs. EA reinforcement learning: maintain information only about good states-action pairs. Thus, memory losses occur.

Example State loss can also occur in EA reinforcement learning. Unlike TD the information content decreases in EA, population actually decreases during learning.

Issues in RL Exploration vs. Exploitation Perceptual Aliasing Generalization Unstable Environments

Exploration vs. Exploitation for TD Too much exploration could result in lower average rewards and too little could prevent the learning system from discovering new optimal states. –Solution 1: –Solution 2: U(i)  (1-  )U(i)+  (R(i) + max a F(  j U(j),N(a,i))) where F(u,n) = { R + if n<N e uotherwise

Exploration vs. Exploitation for EA Too much exploration could result in lower average rewards and too little could prevent the learning system from discovering new optimal states. –Solution comes from the nature of EA. Early in an evolutionary search, selection pressure is low because most policies within the population have lower fitness. As the evolution proceeds, highly fit policy evolve, increasing the selective pressure within in the population.

Perceptual Aliasing or Hidden State In real world situations, the agent often will not have access to complete information on the state of its world. –TD methods are vulnerable to hidden states. Ambiguous state information misleads the TD method. –Since EA methods associate values with entire decision sequences, credits are based on net results.

Example In TD reinforcement learning:

Generalization The generalization of policy desicions from one area of the state space to another. Since the number of possible states of the world grows exponentially with the size of the task. Solution is to apply action decisions from observed states to unobserved states. (ANN, rule bases)

Comparison of EA and TD Generalization of TD: In some large-scale problems, approximating the value function works well, while in some simple toy problems it fails. In discrete table one update affects one state, whereas in generalized case more than one state and create noisiness. (who cares toy problems?) Generalization of EA: Since it make less frequent update and base them on more global informatio, it is more diffucult for single observation to affect the global decision problem.

Unstable Environments The agent must adapt its current decision policy in response to changes that occur in its world.(e.g. Faulty sensors, new obstructions, etc.) –Because TD makes constant updates to decision polic, it should respond to changes as soon as they occur. –Since EA do not update any policy until an individual or a population of individuals have been completely evaluated over several actions, the response to changes delayed.

Evolutionary Reinforcement Learning Implementations Learning Classifier Systems SAMUEL GENITOR SANE

Learning Classifier Systems Mesaages trigger Classifiers which are symbolic if-then rules that map sensory input to action Classifiers are in a competition, resolved by a bidding algorithm. Classifiers messages may trigger new classifiers. Predate TD learning. EA selects, mutates, and recombines classifiers that received the most credit by bidding algorithm. The result was dissaponting.

SAMUEL A system that learns to solve sequential decision problems. SAMUEL searches the space of decision policies for stes of condition- action rules. Each individual is a rule set that specifies its behavior. Each gene is a rule that maps states to actions Three major components: –Problem specific module: task environment simulation –Performance module: interacts with the world, obtain payoffs –Learning module: uses GA to evolve behaviors.

GENITOR GENITOR uses neural networks to represent the decision policy, meaning generalization the state space. GENITOR relies solely on its EA to adjust the weights in NN. NN (individual) is represented in the population as a sequence of connection weights. The croosover is realized on the basis that the offspring performance. Genetic operators are applied aynchronously.

SANE SANE (symbiotic, Adaptive Neuro-Evolution) was designed as a fast, efficient method fro building NN in domains where it is not possible to generate trainning data. NN forms a direct mapping from sensors to actions and provides effective generalization over the state space. Individuals are complete NNs. Two separate population: –Population of neurons : population of building blocks of NN. –Population of network blueprints : population of combination of building blocks of NN.

SANE

Conclusion EA and TD learn through interactions with the actual system. EA and TD do not require precise mathematical model of the domain. The main differences: –Policy representation –Credit assignment –Memory