Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (www.cs.ualberta.ca/~amir)www.cs.ualberta.ca/~amir.

Slides:



Advertisements
Similar presentations
Hierarchical Reinforcement Learning Amir massoud Farahmand
Advertisements

Dialogue Policy Optimisation
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Partially Observable Markov Decision Process (POMDP)
Bio-Inspired Optimization. Our Journey – For the remainder of the course A brief review of classical optimization methods The basics of several stochastic.
Channel Assignment using Chaotic Simulated Annealing Enhanced Neural Network Channel Assignment using Chaotic Simulated Annealing Enhanced Hopfield Neural.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Amir massoud Farahmand
A Summary of the Article “Intelligence Without Representation” by Rodney A. Brooks (1987) Presented by Dain Finn.
Learning and Evolution in Hierarchical Behavior-based Systems
Evolutionary Algorithms Simon M. Lucas. The basic idea Initialise a random population of individuals repeat { evaluate select vary (e.g. mutate or crossover)
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Genetic Algorithms in Materials Processing N. Chakraborti Department of Metallurgical & Materials Engineering Indian Institute of Technology Kharagpur.
Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Evolutionary Computation Application Peter Andras peter.andras/lectures.
IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,
Genetic Programming. Agenda What is Genetic Programming? Background/History. Why Genetic Programming? How Genetic Principles are Applied. Examples of.
Khaled Rasheed Computer Science Dept. University of Georgia
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Genetic Algorithms: A Tutorial
Particle Swarm Optimization Algorithms
1. Optimization and its necessity. Classes of optimizations problems. Evolutionary optimization. –Historical overview. –How it works?! Several Applications.
Computational Intelligence: Methods and Applications Lecture 31 Combinatorial reasoning, or learning from partial observations. Włodzisław Duch Dept. of.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
More on coevolution and learning Jing Xiao April, 2008.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
The Particle Swarm Optimization Algorithm Nebojša Trpković 10 th Dec 2010.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
1 Machine Learning: Lecture 12 Genetic Algorithms (Based on Chapter 9 of Mitchell, T., Machine Learning, 1997)
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Evolving the goal priorities of autonomous agents Adam Campbell* Advisor: Dr. Annie S. Wu* Collaborator: Dr. Randall Shumaker** School of Electrical Engineering.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Pac-Man AI using GA. Why Machine Learning in Video Games? Better player experience Agents can adapt to player Increased variety of agent behaviors Ever-changing.
Subsumption Architecture and Nouvelle AI Arpit Maheshwari Nihit Gupta Saransh Gupta Swapnil Srivastava.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Optimization Problems
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Organic Evolution and Problem Solving Je-Gun Joung.
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Genetic Algorithms. Solution Search in Problem Space.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Optimization Problems
Genetic Algorithm in TDR System
Evolutionary Algorithms Jim Whitehead
Machine Learning overview Chapter 18, 21
Evolving the goal priorities of autonomous agents
Done Done Course Overview What is AI? What are the Major Challenges?
Advanced Artificial Intelligence Evolutionary Search Algorithm
Genetic Algorithms: A Tutorial
Optimization Problems
Subsuption Architecture
Robot Intelligence Kevin Warwick.
Machine Learning: UNIT-4 CHAPTER-2
CS 416 Artificial Intelligence
Introduction to Artificial Intelligence Instructor: Dr. Eduardo Urbina
Traveling Salesman Problem by Genetic Algorithm
Genetic Algorithm Soft Computing: use of inexact t solution to compute hard task problems. Soft computing tolerant of imprecision, uncertainty, partial.
Genetic Algorithms: A Tutorial
Behavior Based Systems
Coevolutionary Automated Software Correction
Presentation transcript:

Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) ( Majid Nili Ahmadabadi (b,c) Caro Lucas (b,c) Babak N. Araabi (b,c) a ) Department of Computing Science, University of Alberta b) Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Engineering, University of Tehran c) School of Cognitive Sciences, IPM

Motivation Situated real-world agents (e.g.) face different uncertainties –Unknown environment/body [exact] Model of environment/body is not known –Non-stationary environment/body Changing environment (offices, houses, streets, and almost everywhere) Aging … Designing a robust controller for such an agent is not easy.

Research Specification Goal: Automatic design of intelligent agent Architecture: Hierarchical behavior- based architectures (a version of Subsumption architecture) –Behavior-based systems: A robust successful approach for designing situated agents Behavioral decomposition Behaviors: Sensors ---> Actions Evaluation: Objective performance measure is available (reinforcement signal) –[Agent] Did I perform it correctly?! –[Tutor] Yes/No! (or 0.3) build maps explore avoid obstacles locomote manipulate the world sensorsactuators

? How should we DESIGN a behavior-based system?!

Behavior-based System Design Methodologies Hand Design –Common in almost everywhere. –Complicated: may be even infeasible in complex problems –Even if it is possible to find a working system, it is probably not the best solution. Evolution –Good solutions can be found (+) –Biologically plausible (+) –Time consuming (-) –Not fast in making new solutions (-) Learning –Biologically plausible (+) –Learning is essential for life-time survival of the agent. (+) –May get stuck in a local minimum (-)

Taxonomy of Design Methods Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure

Taxonomy of Design Methods Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning

Problem Formulation Behaviors

Problem Formulation Purely Parallel Subsumption Architecture (PPSSA) Different behaviors excites Higher behaviors can suppress lower ones. Controlling behavior

Problem Formulation Reinforcement Signal and the Agent’s Value Function This function states the value of using a set of behaviors in an specific structure. We want to maximize the agent’s value function

Problem Formulation Design as an Optimization Structure Learning: Finding the best structure given a set of behaviors using learning Behavior Learning: Finding the best behaviors given the structure using learning Concurrent Behavior and Structure Learning Behavior Evolution: Finding the best behaviors given structure using evolution Behavior Evolution and Structure Learning

Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning

Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).

Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox

Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles 2-The agent hits a wall!

Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.

Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox “explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.

Structure Learning Challenging Issues Representation: How should the agent represent knowledge gathered during learning? –Sufficient (Concept space should be covered by Hypothesis space) –Generalization Capability –Tractable (small Hypothesis space) –Well-defined credit assignment Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? –If the agent receives a reward/punishment, how should we reward/punish the structure of the agent? Learning: How should the agent update its knowledge when it receives reinforcement signal?

Structure Learning Overcoming Challenging Issues Our approach is defining a representation that allows decomposing the agent’s value function to simpler components. Structure can provide a lot of clues to us.

Structure Learning Zero Order Representation avoid obstacles (0.8) avoid obstacles (0.6) explore (0.7) explore (0.9) locomote (0.4) Higher layer Lower layer ZO Value Table in the agent’s mind locomote (0.4)

Structure Learning Zero Order Representation - Value Function Decomposition

Agent’s value function ZO components Layer’s value

Structure Learning Zero Order Representation - Value Function Decomposition

Structure Learning Zero Order Representation - Credit Assignment and Value Updating Controlling behavior is the only responsible behavior for the current reinforcement signal.

Behavior-based System Design LearningEvolution Structure (hierarchy) learning Behavior learning Co-evolution of behaviors Evolution of Structure Hybridization of Evolution and Learning

Behavior Co-evolution Motivations + Learning can trap in the local maxima of objective function Evolutionary methods have more chance to find the global maximum of the objective function Learning is sensitive (POMDP, non-Markov, …) Objective function may not be well-defined in robotics - Evolutionary robotics’ methods are usually slow –Fast changes of the environment Non-modular controllers –Monolithic –No reusability

Behavior Co-evolution Ideas Use evolution to search the difficult and big part of parameters’ space –Behaviors’ parameters space is usually the bigger one Use learning to do fast responses –Structure’s parameters space is usually the smaller one –A change is the structure results in different agent’s behavior Evolve behaviors separately –Modularity –re-usability

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n We have different behavior (genetic) pools

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n One behavior is selected randomly from each pool. We want to assess its fitness.

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Agent interacts with the environment using an architecture that is built by selected behaviors

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n … and tries to maximize its reward.

Behavior Co-evolution Agent Behavior Pool 1 Behavior Pool 2 Behavior Pool n Based on the performance of the agent, a fitness is assigned to it. Fitness

Behavior Co-evolution Fitness Sharing We can evaluate fitness of the agent after its interaction with the environment. How can we assess the fitness of each behavior based on the fitness of the agent? (remember that we have separate behavior pools) We approximate it! (Fitness)

Behavior Co-evolution Each behavior’s genetic pool has conventional evolutionary operators/phenomena –Selection –Genetic Operators Crossover Mutation –Hard »Replacement –Soft »Perturbation

Multi-Robot Object Lifting Problem Three robots want to lift an object using their own local sensors –No central control –No communication –Local sensors Objectives –Reaching prescribed height –Keeping tilt angle small A group of robots lifts a bulky object.

Multi-Robot Object Lifting Problem

Conclusion Hybridization of evolution and learning Evolution and learning search different subspaces of the solution space Competitive results to human-designs

Important Questions Is it possible to benefit from information gathered during learning? –Each agent learns an approximately good structure’s arrangement. However, we do not use it at all! Is there any other way of sharing fitness of the agent between behaviors? –Now, we share all behaviors uniformly. It seems that the answer to these questions is positive!

Future Research Can we decompose other problems (not just hierarchical behavior-based systems) similarly?! –Learning and evolution –Fast and Deep –Different subspaces of the solution space Other ways of fitness sharing –Low bias –Low variance

Questions?!