Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02

Motivations Robots are cool. Robot teams are cooler. Robots are hard to program/control. Robot teams are even harder.

Motivations Robotic soccer - hard!

Motivations Diagnose and rebuild the transmission of this 1969 Jaguar E-Type - Really Hard!

Motivations Answer: Robot Learning!

Motivations Challenges: –Very large state spaces –Uncertain credit assignments –Limited training time –Uncertainty in sensing and shared info –Non-deterministic actions –Difficulty in defining appropriate abstractions for learned info –Difficulty of merging info from different robot experiences

Motivations Benefits –Increased robustness –Reduced Complexity –Increased ease of adding new assets to team

Motivations 4 types of learning in robotic systems: –Learning numerical functions for calibrations or parameter adjustments –Learning about the world –Learning to coordinate behaviors –Learning new behaviors

Learning New Cooperative Behaviors Inherently cooperative tasks are difficult to learn! –Utility of the action of a robot dependent on the actions of other robots –Soccer a good example –Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMC) Scalable

Learning New Cooperative Behaviors CMOMMT Application: –S: a 2-D bounded, enclosed spatial region –V: a team of m robot vehicles w/ 360  field of view w/ limited range. –O(t): a set of n targets in region S at time t –B(t): a matrix such that B ij = 1 if robot i is observing target j at time t. –Sensor coverage is much less than region area

Learning New Cooperative Behaviors Goal: develop algorithm A-CMOMMT –Maximize average number of targets observed at any given time.

Learning New Cooperative Behaviors Human-Generated Solution –Local force vectors Targets attract Teammates repel –Magnitude dependent on distance from robot –Weight reduced if target already being observed –Direction given by summing vectors

Learning New Cooperative Behaviors Results:

Learning New Cooperative Behaviors Distributed, Pessimistic Lazy Q-Learning –No a priori model –Reinforcement learning –Instance-based learning –Assumes lower bound on utility

Learning New Cooperative Behaviors Q-Learning –For each action/state pair, Q(s,a) = 0 –Observe state s. –Do: Select an action and execute Receive reward r Observe new state s’ Update table entry for Q(s,a)

Learning New Cooperative Behaviors Lazy Learning (instance-based learning) –Delays use of gathered info until necessary Randomly built look-up table: (state, action) Reinforcement Function Situation MatcherEvaluation Function World ActionState

Learning New Cooperative Behaviors Pessimistic Algorithm –Rates utility of an action based on lower bound Predict the state following each possible action in current state Compute lower bound of utility of each new state Choose action corresponding to highest lower bound

Learning New Cooperative Behaviors Results –Much better than random –Not as good as human-generated –Significant results

Learning New Cooperative Behaviors Q-Learning w/ VQQL and GLA –State space huge –Want generalized algorithm –2 Phases Learn quantizer Learn Q function

Learning New Cooperative Behaviors Generalized Lloyd Algorithm –Clustering technique Converts continuous state space to discrete –Takes set T of M states –Returns set C of N states –Stopping Criterion (D m - D m+1 )/ D m < 

Learning New Cooperative Behaviors Vector Quantization for Q-Learning –Obtain a set T of examples of states –Design a vector quantizer C using T with GLA –Learn the Q function Choose an action following an exploration strategy Receive experience tuple Quantize the tuple obtaining Update the Q table

Learning New Cooperative Behaviors 2 experiments –Local reward function –Collaborative reward function

Learning New Cooperative Behaviors Results –Competitive –Can handle higher dimension state spaces

Learning for Parameter Adjustment Need robots to perform life-long tasks –Environmental changes –Variations in robot capabilities –Heterogeneity Overlap in capabilities Change in heterogeneity

Learning for Parameter Adjustment Problem def –R: set of n robots –T: set of m tasks –A i : set of actions robot i can perform –H: A i ->T set of functions H, return task completed by action A i –q(a ij ): quality metric –U i : set of actions robot i performs in current mission

Learning for Parameter Adjustment –Given R, T, A i and H, determine set of actions U i that optimizes the performance metric

Learning for Parameter Adjustment ALLIANCE overview –Completely distributed –Behaviors grouped into sets Activated as a set Controlled by high-level motivational behaviors –Impatience and Acquiescence thresholds –Broadcast communication

Learning for Parameter Adjustment L-ALLIANCE overview –Extension of ALLIANCE Automatically updates motivational behaviors –2 problems to solve: How to give robots ability to obtain knowledge about the quality of team member performance How to use team member performance knowledge to select a task to pursue

Learning for Parameter Adjustment –Performance monitors One for every behavior set Monitors how self and others performing

Learning for Parameter Adjustment –Control phases Active learning phase –Random choices –Maximally patient –Catalog monitors and update control parameters Adaptive learning phase –Must make effort to accomplish mission –Acquiesce and become impatient quickly –Still catalog monitors and update control parameters

Learning for Parameter Adjustment –Action Selection Strategy At each iteration, robot r i divides remaining tasks into two categories –Tasks that r i expects to perform better than all others and are not being currently done –All other tasks r i can do Robot r i repeats following until no tasks left to do –Select tasks from the first category, longest first until none left –Select tasks from second category, shortest first

Learning for Parameter Adjustment Results - Box Pushing –Experiment 1 2 identical robots 1 fails

Learning for Parameter Adjustment –Experiment 2 2 different robots Different capabilities –L-ALLIANCE capable of keeping teams working toward goal Changes to composition Changes to ability

Conclusions Lots of challenges left Rewards tantalizing Learning approaches not yet superior to human generated solutions

Questions?

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Similar presentations

Presentation on theme: "Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Similar presentations

Presentation on theme: "Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02."— Presentation transcript:

Similar presentations

About project

Feedback