IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,

IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi farahmand@ipm.irfarahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior- based System using Reinforcement Learning Department of Electrical and Computer Engineering University of Tehran Iran

IROS04 (Japan, Sendai) University of Tehran Paper Outline Challenges and Requirements of Robotic Systems Behavior-based Approach to AI How should we design a Behavior-based System (BBS)?! Learning in BBS Structure Learning in BBS Value Function Decomposition Experiments: Multi-Robot Object Lifting Conclusions, Ongoing Research, and Future Work

IROS04 (Japan, Sendai) University of Tehran Challenges and Requirements of Robotic Systems Challenges Sensor and Effector Uncertainty Partial Observability Non-Stationarity Requirements (among many others) Multi-goal Robustness Multiple Sensors Scalability Automatic design [Learning]

IROS04 (Japan, Sendai) University of Tehran Behavior-based Approach to AI Behavior-based approach as a good candidate for low-level intelligence. Behavioral (activity) decomposition –against functional decomposition Behavior: Sensor->Action (Direct link between perception and action) Situatedness –Situatedness motto: The world is its own best model! Embodiment Intelligence as Emergence –(interaction of agent with environment)

IROS04 (Japan, Sendai) University of Tehran Behavioral decomposition manipulate the world build maps explore locomote avoid obstacles sensorsactuators

IROS04 (Japan, Sendai) University of Tehran Behavior-based System Design Hand Design –Common in almost everywhere (just ask some people in IROS04) –Complicated: may be infeasible in complex problems –Even if it is possible to find a working system, probably it is not optimal. Evolution –Time consuming –Good solutions can be found –Biologically feasible Learning –Biologically feasible –Learning is essential for life-time survival of the agent. We have focuses on learning in this presentation.

IROS04 (Japan, Sendai) University of Tehran The Importance of Learning Unknown environment/body –[exact] Model of environment/body is not known Non-stationary environment/body –Changing environment (offices, houses, streets, and almost everywhere) –Aging Designer may not know how to benefit from every aspects of her agent/environment –Let’s the agent learn it by itself (learning as optimization) etc …

IROS04 (Japan, Sendai) University of Tehran Learning in Behavior-based Systems There are a few works on behavior-based learning –Mataric, Mahadevan, Maes, and... … but there is no deep investigation about it (specially mathematical formulation)!

IROS04 (Japan, Sendai) University of Tehran Learning in Behavior-based Systems There are different methods of learning with different viewpoints, but we have concentrated on Reinforcement Learning. –[Agent] Did I perform it correctly?! –[Tutor] Yes/No!

IROS04 (Japan, Sendai) University of Tehran Learning in Behavior-based Systems We have divided learning in BBS into these two parts: Structure Learning –How should we organize behaviors in the architecture assume having a repertoire of working behaviors Behavior Learning –How should each behavior behave? (we do not have a necessary toolbox)

IROS04 (Japan, Sendai) University of Tehran Structure Learning Assumptions Structure Learning in Subsumption Architecture as a good sample for BBS Purely parallel case We know B1, B2, and … but we do not know how to arrange them in the architecture –we know how to {avoid obstacles, pick an object, stop, move forward, turn, …} but we don’t know which one is superior to others.

IROS04 (Japan, Sendai) University of Tehran Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).

IROS04 (Japan, Sendai) University of Tehran Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox

IROS04 (Japan, Sendai) University of Tehran Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles 2-The agent hits a wall!

IROS04 (Japan, Sendai) University of Tehran Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.

IROS04 (Japan, Sendai) University of Tehran Structure Learning manipulate the world build maps explore locomote avoid obstacles Behavior Toolbox “explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.

IROS04 (Japan, Sendai) University of Tehran Structure Learning Issues How should we represent structure? –Sufficient (Concept space should be covered by Hypothesis space) –Tractable (small Hypothesis space) –Well-defined credit assignment How should we assign credits to architecture? –If the agent receives a reward/punishment, how should we reward/punish structure of the architecture?

IROS04 (Japan, Sendai) University of Tehran Value Function Decomposition and Structure Learning Each structure has a value regarding its receiving reinforcement signal. The objective is finding a structure T with a high value. We have decomposed value function to simpler components that enable us to benefit from previous experiments.

IROS04 (Japan, Sendai) University of Tehran Value Function Decomposition It is possible to decompose total system’s value to value of each behavior in each layer. We call it Zero-Order method.

IROS04 (Japan, Sendai) University of Tehran Value Function Decomposition Zero Order Method It stores the value of behavior-being in a specific layer. avoid obstacles (0.8) avoid obstacles (0.6) explore (0.7) explore (0.9) locomote (0.4) Higher layer Lower layer ZO Value Table in the agent’s mind locomote (0.4)

IROS04 (Japan, Sendai) University of Tehran Credit Assignment for Zero Order Method Controlling behavior is the only responsible behavior for the current reinforcement signal. Appropriate ZO value table updating method is available.

IROS04 (Japan, Sendai) University of Tehran Value Function Decomposition Another Method (First Order) It stores the value of relative order of behaviors –How much is it good/bad if “B1 is being placed higher than B2”?! V(avoid obstacles > explore) = 0.8 V(explore > avoid obstacles) = -0.3 Sorry! Not that easy (and informative) to show graphically!! Credits are assigned to all (controlling, activated) pairs of behaviors. –The agent receives reward while B1 is controlling and B3 and B5 are activated (B1>B3): + (B1>B5): +

IROS04 (Japan, Sendai) University of Tehran Structure Representation Both of these methods are provided with a lot of probabilistic reasoning which shows how to –decompose total system value to simple components –assign credits –update values table Check the Proceeding for Mathematical Formulation!

IROS04 (Japan, Sendai) University of Tehran Example: Multi-Robot Object Lifting A Group of three robots want to lift an object using their own local sensors –No central control –No communication –Local sensors Objectives –Reaching prescribed height –Keeping tilt angle small

IROS04 (Japan, Sendai) University of Tehran Example: Multi-Robot Object Lifting Behavior Toolbox Stop Push More Hurry Up Slow Down Don’t Go Fast ?!

IROS04 (Japan, Sendai) University of Tehran Example: Multi-Robot Object Lifting Sample shot of tilt angle of the object after sufficient learning

IROS04 (Japan, Sendai) University of Tehran Example: Multi-Robot Object Lifting Sample shot of height of each robot after sufficient learning

IROS04 (Japan, Sendai) University of Tehran Example: Multi-Robot Object Lifting Sample shot of tilt angle of the object after sufficient learning

IROS04 (Japan, Sendai) University of Tehran Conclusions, Ongoing Research, and Future Work We have devised two different methods for structure learning for behavior-based system. Good results in two different tasks –Multi-robot Object Lifting –An Abstract Problem (not reported yet)

IROS04 (Japan, Sendai) University of Tehran Conclusions, Ongoing Research, and Future Work … but from where should we find necessary behaviors?! –Behavior Learning We have devised some methods for behavior learning which will be reported soon.

IROS04 (Japan, Sendai) University of Tehran Conclusions, Ongoing Research, and Future Work However, there are many steps remained for fully automated agent design –How should we generate new behaviors without even knowing which sensory information is necessary for the task (feature selection) –Problem of Reinforcement Signal Design Designing a good reinforcement signal is not easy at all.

IROS04 (Japan, Sendai) University of Tehran

IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,

Similar presentations

Presentation on theme: "IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,

Similar presentations

Presentation on theme: "IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,"— Presentation transcript:

Similar presentations

About project

Feedback