Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) ( Majid Nili Ahmadabadi (b,c) Caro Lucas (b,c) Babak N. Araabi (b,c) a ) Department of Computing Science, University of Alberta b) Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Engineering, University of Tehran c) School of Cognitive Sciences, IPM
Motivation Situated real-world agents (e.g.) face different uncertainties –Unknown environment/body [exact] Model of environment/body is not known –Non-stationary environment/body Changing environment (offices, houses, streets, and almost everywhere) Aging … Designing a robust controller for such an agent is not easy.
Research Specification Goal: Automatic design of intelligent agent Architecture: Hierarchical behavior- based architectures (a version of Subsumption architecture) –Behavior-based systems: A robust successful approach for designing situated agents Behavioral decomposition Behaviors: Sensors ---> Actions Evaluation: Objective performance measure is available (reinforcement signal) –[Agent] Did I perform it correctly?! –[Tutor] Yes/No! (or 0.3) build maps explore avoid obstacles locomote manipulate the world sensorsactuators
? How should we DESIGN a behavior-based system?!
Behavior-based System Design Methodologies Hand Design –Common in almost everywhere. –Complicated: may be even infeasible in complex problems –Even if it is possible to find a working system, it is probably not the best solution. Evolution –Good solutions can be found (+) –Biologically plausible (+) –Time consuming (-) –Not fast in making new solutions (-) Learning –Biologically plausible (+) –Learning is essential for life-time survival of the agent. (+) –May get stuck in a local minimum (-)
Taxonomy of Design Methods Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure
Taxonomy of Design Methods Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning
Problem Formulation Behaviors
Problem Formulation Purely Parallel Subsumption Architecture (PPSSA) Different behaviors excites Higher behaviors can suppress lower ones. Controlling behavior
Problem Formulation Reinforcement Signal and the Agent’s Value Function This function states the value of using a set of behaviors in an specific structure. We want to maximize the agent’s value function
Problem Formulation Design as an Optimization Structure Learning: Finding the best structure given a set of behaviors using learning Behavior Learning: Finding the best behaviors given the structure using learning Concurrent Behavior and Structure Learning Behavior Evolution: Finding the best behaviors given structure using evolution Behavior Evolution and Structure Learning
Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning
Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).
Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox
Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles 2-The agent hits a wall!
Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.
Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox “explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.
Structure Learning Challenging Issues Representation: How should the agent represent knowledge gathered during learning? –Sufficient (Concept space should be covered by Hypothesis space) –Generalization Capability –Tractable (small Hypothesis space) –Well-defined credit assignment Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? –If the agent receives a reward/punishment, how should we reward/punish the structure of the agent? Learning: How should the agent update its knowledge when it receives reinforcement signal?
Structure Learning Overcoming Challenging Issues Our approach is defining a representation that allows decomposing the agent’s value function to simpler components. Structure can provide a lot of clues to us.
Structure Learning Zero Order Representation avoid obstacles (0.8) avoid obstacles (0.6) explore (0.7) explore (0.9) locomote (0.4) Higher layer Lower layer ZO Value Table in the agent’s mind locomote (0.4)
Structure Learning Zero Order Representation - Value Function Decomposition
Agent’s value function ZO components Layer’s value
Structure Learning Zero Order Representation - Value Function Decomposition
Structure Learning Zero Order Representation - Credit Assignment and Value Updating Controlling behavior is the only responsible behavior for the current reinforcement signal.
Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning
Behavior Co-evolution Motivations + Learning can trap in the local maxima of objective function Evolutionary methods have more chance to find the global maximum of the objective function Learning is sensitive (POMDP, non-Markov, …) Objective function may not be well-defined in robotics - Evolutionary robotics’ methods are usually slow –Fast changes of the environment Non-modular controllers –Monolithic –No reusability
Behavior Co-evolution Ideas Use evolution to search the difficult and big part of parameters’ space –Behaviors’ parameters space is usually the bigger one Use learning to do fast responses –Structure’s parameters space is usually the smaller one –A change is the structure results in different agent’s behavior Evolve behaviors separately –Modularity –re-usability
Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n We have different behavior (genetic) pools
Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n One behavior is selected randomly from each pool. We want to assess its fitness.
Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Agent interacts with the environment using an architecture that is built by selected behaviors
Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n … and tries to maximize its reward.
Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Based on the performance of the agent, a fitness is assigned to it. Fitness
Behavior Co-evolution Fitness Sharing We can evaluate fitness of the agent after its interaction with the environment. How can we assess the fitness of each behavior based on the fitness of the agent? (remember that we have separate behavior pools) We approximate it! (Fitness)
Behavior Co-evolution Each behavior’s genetic pool has conventional evolutionary operators/phenomena –Selection –Genetic Operators Crossover Mutation –Hard »Replacement –Soft »Perturbation
Multi-Robot Object Lifting Problem Three robots want to lift an object using their own local sensors –No central control –No communication –Local sensors Objectives –Reaching prescribed height –Keeping tilt angle small A group of robots lifts a bulky object.
Multi-Robot Object Lifting Problem
Conclusion Hybridization of evolution and learning Evolution and learning search different subspaces of the solution space Competitive results to human-designs
Important Questions Is it possible to benefit from information gathered during learning? –Each agent learns an approximately good structure’s arrangement. However, we do not use it at all! Is there any other way of sharing fitness of the agent between behaviors? –Now, we share all behaviors uniformly. It seems that the answer to these questions is positive!
Future Research Can we decompose other problems (not just hierarchical behavior-based systems) similarly?! –Learning and evolution –Fast and Deep –Different subspaces of the solution space Other ways of fitness sharing –Low bias –Low variance
Questions?!