Learning and Evolution in Hierarchical Behavior-based Systems

Slides:



Advertisements
Similar presentations
Hierarchical Reinforcement Learning Amir massoud Farahmand
Advertisements

Dialogue Policy Optimisation
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
5-1 Chapter 5: REACTIVE AND HYBRID ARCHITECTURES.
An Introduction to Artificial Intelligence. Introduction Getting machines to “think”. Imitation game and the Turing test. Chinese room test. Key processes.
Bio-Inspired Optimization. Our Journey – For the remainder of the course A brief review of classical optimization methods The basics of several stochastic.
Planning under Uncertainty
Amir massoud Farahmand
A Summary of the Article “Intelligence Without Representation” by Rodney A. Brooks (1987) Presented by Dain Finn.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Evolutionary Computational Intelligence Lecture 8: Memetic Algorithms Ferrante Neri University of Jyväskylä.
Robotics for Intelligent Environments
Behavior-based AI Amir massoud Farahmand
IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,
How to Stall a Motor: Information-Based Optimization for Safety Refutation of Hybrid Systems Todd W. Neller Knowledge Systems Laboratory Stanford University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Optimal Arrangement of Ceiling Cameras for Home Service Robots Using Genetic Algorithms Stefanos Nikolaidis*, ** and Tamio Arai** *R&D Division, Square.
On Roles of Models in Information Systems (Arne Sølvberg) Gustavo Carvalho 26 de Agosto de 2010.
컴퓨터 그래픽스 분야의 캐릭터 자동생성을 위하여 인공생명의 여러 가지 방법론이 어떻게 적용될 수 있는지 이해
Genetic Programming on Program Traces as an Inference Engine for Probabilistic Languages Vita Batishcheva, Alexey Potapov
Introduction to Behavior- Based Robotics Based on the book Behavior- Based Robotics by Ronald C. Arkin.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Using Genetic Programming to Learn Probability Distributions as Mutation Operators with Evolutionary Programming Libin Hong, John Woodward, Ender Ozcan,
Techniques for Analysis and Calibration of Multi- Agent Simulations Manuel Fehler Franziska Klügl Frank Puppe Universität Würzburg Lehrstuhl für Künstliche.
1 Paper Review for ENGG6140 Memetic Algorithms By: Jin Zeng Shaun Wang School of Engineering University of Guelph Mar. 18, 2002.
/ 17 Hybridizing evolutionary computation and reinforcement learning for the design of almost universal controllers for autonomous robots (Neurocomputing.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Fuzzy Genetic Algorithm
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
1 A New Method for Composite System Annualized Reliability Indices Based on Genetic Algorithms Nader Samaan, Student,IEEE Dr. C. Singh, Fellow, IEEE Department.
Evolving the goal priorities of autonomous agents Adam Campbell* Advisor: Dr. Annie S. Wu* Collaborator: Dr. Randall Shumaker** School of Electrical Engineering.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Subsumption Architecture and Nouvelle AI Arpit Maheshwari Nihit Gupta Saransh Gupta Swapnil Srivastava.
Distributed Q Learning Lars Blackmore and Steve Block.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Chapter 15. Cognitive Adequacy in Brain- Like Intelligence in Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans Cinarel, Ceyda.
Robot Intelligence Technology Lab. 10. Complex Hardware Morphologies: Walking Machines Presented by In-Won Park
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Presented By: Farid, Alidoust Vahid, Akbari 18 th May IAUT University – Faculty.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Figure 5: Change in Blackjack Posterior Distributions over Time.
Done Done Course Overview What is AI? What are the Major Challenges?
Machine Learning Basics
CSC 380: Design and Analysis of Algorithms
Subsuption Architecture
Robot Intelligence Kevin Warwick.
CS 416 Artificial Intelligence
Introduction to Artificial Intelligence Instructor: Dr. Eduardo Urbina
CSC 380: Design and Analysis of Algorithms
Behavior Based Systems
Coevolutionary Automated Software Correction
Presentation transcript:

Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N. Araabi

University of Tehran - Dept. of ECE Motivation Machines (e.g. robots): from labs. to homes, factories, … . Machines face: Unknown environment/body [exact] Model of environment/body is not known Non-stationary environment/body Changing environment (offices, houses, streets, and almost everywhere) Aging Designer may not know how to benefit from every aspects of her agent/environment University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Motivation Difficulty of the design process Machines see different things Machines interact differently The designer is not a machine! I know what I want! Our goal: Automatic design of intelligent machines University of Tehran - Dept. of ECE

Research Specification Goal: Automatic design of intelligent robots Architecture: Hierarchical behavior-based architectures. Objective performance measure is available (reinforcement signal) [Agent] Did I perform it correctly?! [Tutor] Yes/No! (or 0.3) University of Tehran - Dept. of ECE

Behavior-based Approach to AI Behavior-based approach as a successful alternative for classical AI approach No {Abstraction, Planning, Deduction, … } Behavioral (activity) decomposition against functional decomposition Behavior: Sensor->Action (Direct link between perception and action) University of Tehran - Dept. of ECE

Behavioral Decomposition manipulate the world build maps sensors actuators explore avoid obstacles locomote University of Tehran - Dept. of ECE

Behavior-based Design Robust not sensitive to failure of particular part of the system no need for precise perception as there is no modelling there Reactive: Fast response as there is no long route from perception to action No explicit representation University of Tehran - Dept. of ECE

? DESIGN How should we a behavior-based system?! University of Tehran - Dept. of ECE

Behavior-based System Design Methodologies Hand Design Common in almost everywhere. Complicated: may be even infeasible in complex problems Even if it is possible to find a working system, it is not optimal probably. Evolution Good solutions can be found Biologically feasible Time consuming Not fast in making new solutions Learning Learning is essential for life-time survival of the agent. University of Tehran - Dept. of ECE

Taxonomy of Design Methods University of Tehran - Dept. of ECE

Problem Formulation Behaviors University of Tehran - Dept. of ECE

Problem Formulation Purely Parallel Subsumption Architecture (PPSSA) Different behaviors excites Higher behaviors can suppress lower ones. Controlling behavior University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Problem Formulation Reinforcement Signal and the Agent’s Value Function This function states the value of using a set of behaviors in an specific structure. We want to maximize the agent’s value function University of Tehran - Dept. of ECE

Problem Formulation Design as an Optimization Structure Learning: Finding the best structure given a set of behaviors using learning Behavior Learning: Finding the best behaviors given the structure using learning Concurrent Behavior and Structure Learning Behavior Evolution: Finding the best behaviors given structure using evolution Behavior Evolution and Structure Learning University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Where?! University of Tehran - Dept. of ECE

Learning in Behavior-based Systems There are a few researches on behavior-based learning Mataric, Mahadevan, Maes, and ... … but there is no deep investigation about it (specially mathematical formulation)! And most of them incorporate flat architectures. University of Tehran - Dept. of ECE

Learning in Behavior-based Systems We design: Structure (Hierarchy) Behavior We Learn: Structure Learning Organizing behaviors in the architecture using a behavior toolbox Behavior Learning The correct mapping of each behavior University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Where?! University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning build maps explore manipulate the world The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor). locomote avoid obstacles Behavior Toolbox University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning build maps explore manipulate the world locomote avoid obstacles Behavior Toolbox University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning build maps manipulate the world explore locomote avoid obstacles 1-explore becomes controlling behavior and suppress avoid obstacles 2-The agent hits a wall! Behavior Toolbox University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning build maps manipulate the world explore locomote avoid obstacles Tutor (environment) gives explore a punishment for its being in that place of the structure. Behavior Toolbox University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning build maps manipulate the world explore locomote avoid obstacles “explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”. Behavior Toolbox University of Tehran - Dept. of ECE

Structure Learning Challenging Issues Representation: How should the agent represent knowledge gathered during learning? Sufficient (Concept space should be covered by Hypothesis space) Generalization Capability Tractable (small Hypothesis space) Well-defined credit assignment Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? If the agent receives a reward/punishment, how should we reward/punish the structure of the agent? Learning: How should the agent update its knowledge when it receives reinforcement signal? University of Tehran - Dept. of ECE

Structure Learning Overcoming Challenging Issues Our approach is defining a representation that allows decomposing the agent’s value function to simpler components. Decomposing the behavior of a multi-agent system to simpler components may enhance our vision to the problem under investigation. Structure can provide a lot of clues to us. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning University of Tehran - Dept. of ECE

Structure Learning Zero Order Representation ZO Value Table in the agent’s mind avoid obstacles (0.8) explore (0.7) locomote (0.4) Higher layer avoid obstacles (0.6) explore (0.9) locomote (0.4) Lower layer University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning Zero Order Representation - Value Function Decomposition University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning Zero Order Representation - Value Function Decomposition ZO components Layer’s value Agent’s value function University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning Zero Order Representation - Value Function Decomposition University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Structure Learning Zero Order Representation - Credit Assignment and Value Updating Controlling behavior is the only responsible behavior for the current reinforcement signal. University of Tehran - Dept. of ECE

Structure Learning First Order Representation University of Tehran - Dept. of ECE

Structure Learning First Order Representation University of Tehran - Dept. of ECE

Structure Learning First Order Representation University of Tehran - Dept. of ECE

Structure Learning First Order Representation – Credit Assignment If only one behavior becomes activated, we should update V0(i) . If two or more behaviors become active, we must update V(i>j) for which ‘i’ is the index of the controlling behavior and ‘j’ which is the index of the next active behavior . University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE A Break! University of Tehran - Dept. of ECE

Introduction to Experiments Abstract problem Multi-robot object lifting problem I will only discuss this problem now. A group of robots lifts a bulky object. University of Tehran - Dept. of ECE

Experiments Structure Learning Comparison of the average gained reward of two different structure learning methods (Zero Order (ZO) and First Order (FO)), hand- designed structure, and random structure for the object lifting problem. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Where?! University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Behavior Learning No more behavior repertoire assumption All we know Sensor/Actuator dimensions Reinforcement Signal University of Tehran - Dept. of ECE

Behavior Learning Challenging Issues How should behaviors cooperative with each other to maximize the performance of the agent? How should we assign credit to behaviors of the architecture? How should each behavior update its knowledge? University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Behavior Learning B2, B3, and B4 excite B4 takes the control Punishment!!! ?! University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Behavior Learning Augmenting the action space with a pseudo-action named NoAction (NA) NA does nothing and let lower behaviors take control B2, B3, B4 excite B4 proposed NA B3 proposes an action and takes control Reward! University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Behavior Learning NA lets behaviors to cooperate How should we force them to cooperative correctly?! Hierarchical Credit Assignment Problem Boolean-like algebra for logically expressible multi-agent systems University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Behavior Learning University of Tehran - Dept. of ECE

Behavior Learning Optimality Internal states of different behaviors excites in different regions University of Tehran - Dept. of ECE

Behavior Learning Optimality University of Tehran - Dept. of ECE

Behavior Learning Value Updating For the case of immediate reward University of Tehran - Dept. of ECE

Behavior Learning Value Updating For the general return case, we should use Monte Carlo estimation. Bootstrapping method is not applicable. University of Tehran - Dept. of ECE

Concurrent Behavior and Structure Learning Applying Behavior Learning State-Action Mappings Structure Learning Hierarchy University of Tehran - Dept. of ECE

Experiments Behavior Learning Reward comparison between structure learning, behavior learning, and concurrent behavior/structure learning methods for the object lifting task. University of Tehran - Dept. of ECE

Experiments Behavior Learning Learning phase Testing phase University of Tehran - Dept. of ECE

Experiments Behavior Learning Testing phase Learning phase University of Tehran - Dept. of ECE

Experiments Behavior Learning A sample trajectory showing the position of robot-object contact points, the tilt angle of the object during object lifting, and controlling behavior of robots in each time steps after sufficient structure/behavior learning. Behaviors correspondence with numbers of lowest diagram is as follows: 0 (No Behavior), 1 (Push More), 2 (Don’t Go Fast), 3 (Stop), 4 (Hurry up), 5 (Slow down). University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Where?! University of Tehran - Dept. of ECE

Behavior Co-evolution Motivations + Learning can trap in local maxima of objective function Learning is sensitive (POMDP, non-Markov, …) Evolutionary methods have more chance to find the global maximum of the objective function Objective function may not be well-defined in robotics - Evolutionary robotics’ methods are usually slow Fast changes of the environment Non-modular controllers Monolithic No reusability University of Tehran - Dept. of ECE

Behavior Co-evolution Motivations Use evolution to search the difficult and big part of parameters’ space Behaviors’ parameters space is usually the bigger one Use learning to do fast responses Structure’s parameters space is usually the smaller one A change is the structure results in different agent’s behavior Evolve behaviors separately (modularity and re-usability) University of Tehran - Dept. of ECE

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Evolve each kind of behavior in its own genetic pool University of Tehran - Dept. of ECE

Behavior Co-evolution Fitness Sharing Fitness of the agent  Fitness of each behavior?! Fitness Sharing Uniform Value-based University of Tehran - Dept. of ECE

Behavior Co-evolution Uniform Fitness Sharing University of Tehran - Dept. of ECE

Behavior Co-evolution Value-based Fitness Sharing University of Tehran - Dept. of ECE

Behavior Co-evolution Each behavior’s genetic pool Selection Genetic Operators Crossover Mutation Hard Replacement Soft Perturbation University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Where?! University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Memetic Algorithm We waste learned knowledge after each agent’s lifetime Meme as a unit of information that reproduces itself as people exchange idea Traditional memetic algorithms: Evolutionary Method: Meme exchange Local Search: Meme refinement May be called as Hybrid Evolutionary Algorithm University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Memetic Algorithm Two different interpretations of meme: Current hybridization of behavior co-evolution and structure learning Similar to traditional MA Difference with traditional MA: different parameters spaces are being searched Meme as a cultural bias University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Memetic Algorithm Experienced individuals store their experiences in the form of meme in the culture. Newborn individuals get a new meme from the culture. Structure as a meme University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Memetic Algorithm Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Meme Pool (Culture) University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Memetic Algorithm Each meme has its own value Value of the meme is updated using the fitness of the agent Valuable memes have more chance to be selected for newborn individuals University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Experiments Behavior Co-evolution – Structure Learning – Memetic Algorithm (Object Lifting) Averaged last five episodes fitness comparison for different design methods: 1) evolution of behaviors (uniform fitness sharing) and learning structure (blue), 2) evolution of behaviors (valued-based fitness sharing) and learning structure (black), 3) hand-designed behaviors with learning structure (green), and 4) hand-designed behaviors and structure (red). Dotted line across the hand-designed cases (3 and 4) show one standard deviation region across the mean performance. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Experiments Behavior Co-evolution – Structure Learning – Memetic Algorithm (Object Lifting) Averaged last five episodes and lifetime fitness comparison for uniform fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is much higher. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Experiments Behavior Co-evolution – Structure Learning – Memetic Algorithm (Object Lifting) Probability distribution comparison for uniform fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Experiments Behavior Co-evolution – Structure Learning – Memetic Algorithm (Object Lifting) Averaged last five episodes and lifetime fitness comparison for value-based fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is higher. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Experiments Behavior Co-evolution – Structure Learning – Memetic Algorithm Figure 13. (Object Lifting) Probability distribution comparison for value-based fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents. University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Other Topics Probabilistic Analysis of PPSSA Change in the excitation probability  Change in the controlling probability of each layer. Some estimate of learning time The effect of reinforcement signal uncertainty on Value function Policy of the agent University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Conclusions University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Contributions Deep and mathematical investigation of behavior-based systems Tackling the design process from different approaches Learning Evolution Culture-based methods Structure learning is quite new in hierarchical reinforcement learning University of Tehran - Dept. of ECE

Suggestions for the Future Work Extending the proposed methods to more complex architectures Automatic behaviors’ state space extraction Traditional clustering methods are not suitable Convergence proof in learning Automatic Abstraction of Knowledge Simultaneous low-level and high-level decision making Investigations on the reinforcement signal design University of Tehran - Dept. of ECE

University of Tehran - Dept. of ECE Thanks! University of Tehran - Dept. of ECE

The Effect of Reinforcement Signal Uncertainty on the Value Function Uncertainty Model University of Tehran - Dept. of ECE

The Effect of Reinforcement Signal Uncertainty on the Agent’s Policy Boltzman action selection University of Tehran - Dept. of ECE

The Effect of Reinforcement Signal Uncertainty on the Agent’s Policy University of Tehran - Dept. of ECE

The Effect of Reinforcement Signal Uncertainty on the Agent’s Policy نتايج قسمت تاثير خطا بر تابع ارزش University of Tehran - Dept. of ECE

Reinforcement Uncertainty Simulations شکل2. مقايسه بين خطاي مشاهده شده و کران به دست آمده به ازاي γ=0.1 شکل 1. خطاي به ازاي مقادير γ مختلف University of Tehran - Dept. of ECE

Reinforcement Uncertainty Simulations شکل4. مقايسه بين خطاي مشاهده شده و کران به دست آمده به ازاي γ=0.9 شکل 3. مقايسه بين خطاي مشاهده شده و کران به‌دست آمده به ازاي γ=0.5 University of Tehran - Dept. of ECE

Reinforcement Uncertainty Simulations شکل5. کران بالا و پايين نسبت احتمالات عامل با سيگنال تقويت نادقيق به احتمالات عامل با سيگنال تقويت اصلي به ازاي مقادير مختلف γ (آبي: γ=0.1، مشکي: γ=0.5، قرمز: γ=0.9). شکل6. مقايسه بين نسبت احتمالات مشاهده شده و کران‌هاي به دست آمده به ازاي γ=0.1 University of Tehran - Dept. of ECE

Reinforcement Uncertainty Simulations شکل 7. مقايسه بين نسبت احتمالات مشاهده شده و کران‌هاي به دست آمده به ازاي γ=0.5 شکل 8. مقايسه بين نسبت احتمالات مشاهده شده و کران‌هاي به دست آمده به ازاي γ=0.9 University of Tehran - Dept. of ECE