Download presentation
Presentation is loading. Please wait.
Published byVivian Evans Modified over 9 years ago
1
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02
2
Motivations Robots are cool. Robot teams are cooler. Robots are hard to program/control. Robot teams are even harder.
3
Motivations Robotic soccer - hard!
4
Motivations Diagnose and rebuild the transmission of this 1969 Jaguar E-Type - Really Hard!
5
Motivations Answer: Robot Learning!
6
Motivations Challenges: –Very large state spaces –Uncertain credit assignments –Limited training time –Uncertainty in sensing and shared info –Non-deterministic actions –Difficulty in defining appropriate abstractions for learned info –Difficulty of merging info from different robot experiences
7
Motivations Benefits –Increased robustness –Reduced Complexity –Increased ease of adding new assets to team
8
Motivations 4 types of learning in robotic systems: –Learning numerical functions for calibrations or parameter adjustments –Learning about the world –Learning to coordinate behaviors –Learning new behaviors
9
Learning New Cooperative Behaviors Inherently cooperative tasks are difficult to learn! –Utility of the action of a robot dependent on the actions of other robots –Soccer a good example –Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMC) Scalable
10
Learning New Cooperative Behaviors CMOMMT Application: –S: a 2-D bounded, enclosed spatial region –V: a team of m robot vehicles w/ 360 field of view w/ limited range. –O(t): a set of n targets in region S at time t –B(t): a matrix such that B ij = 1 if robot i is observing target j at time t. –Sensor coverage is much less than region area
11
Learning New Cooperative Behaviors Goal: develop algorithm A-CMOMMT –Maximize average number of targets observed at any given time.
12
Learning New Cooperative Behaviors Human-Generated Solution –Local force vectors Targets attract Teammates repel –Magnitude dependent on distance from robot –Weight reduced if target already being observed –Direction given by summing vectors
13
Learning New Cooperative Behaviors Results:
14
Learning New Cooperative Behaviors Distributed, Pessimistic Lazy Q-Learning –No a priori model –Reinforcement learning –Instance-based learning –Assumes lower bound on utility
15
Learning New Cooperative Behaviors Q-Learning –For each action/state pair, Q(s,a) = 0 –Observe state s. –Do: Select an action and execute Receive reward r Observe new state s’ Update table entry for Q(s,a)
16
Learning New Cooperative Behaviors Lazy Learning (instance-based learning) –Delays use of gathered info until necessary Randomly built look-up table: (state, action) Reinforcement Function Situation MatcherEvaluation Function World ActionState
17
Learning New Cooperative Behaviors Pessimistic Algorithm –Rates utility of an action based on lower bound Predict the state following each possible action in current state Compute lower bound of utility of each new state Choose action corresponding to highest lower bound
18
Learning New Cooperative Behaviors Results –Much better than random –Not as good as human-generated –Significant results
19
Learning New Cooperative Behaviors Q-Learning w/ VQQL and GLA –State space huge –Want generalized algorithm –2 Phases Learn quantizer Learn Q function
20
Learning New Cooperative Behaviors Generalized Lloyd Algorithm –Clustering technique Converts continuous state space to discrete –Takes set T of M states –Returns set C of N states –Stopping Criterion (D m - D m+1 )/ D m <
21
Learning New Cooperative Behaviors Vector Quantization for Q-Learning –Obtain a set T of examples of states –Design a vector quantizer C using T with GLA –Learn the Q function Choose an action following an exploration strategy Receive experience tuple Quantize the tuple obtaining Update the Q table
22
Learning New Cooperative Behaviors 2 experiments –Local reward function –Collaborative reward function
23
Learning New Cooperative Behaviors Results –Competitive –Can handle higher dimension state spaces
24
Learning for Parameter Adjustment Need robots to perform life-long tasks –Environmental changes –Variations in robot capabilities –Heterogeneity Overlap in capabilities Change in heterogeneity
25
Learning for Parameter Adjustment Problem def –R: set of n robots –T: set of m tasks –A i : set of actions robot i can perform –H: A i ->T set of functions H, return task completed by action A i –q(a ij ): quality metric –U i : set of actions robot i performs in current mission
26
Learning for Parameter Adjustment –Given R, T, A i and H, determine set of actions U i that optimizes the performance metric
27
Learning for Parameter Adjustment ALLIANCE overview –Completely distributed –Behaviors grouped into sets Activated as a set Controlled by high-level motivational behaviors –Impatience and Acquiescence thresholds –Broadcast communication
28
Learning for Parameter Adjustment L-ALLIANCE overview –Extension of ALLIANCE Automatically updates motivational behaviors –2 problems to solve: How to give robots ability to obtain knowledge about the quality of team member performance How to use team member performance knowledge to select a task to pursue
29
Learning for Parameter Adjustment –Performance monitors One for every behavior set Monitors how self and others performing
30
Learning for Parameter Adjustment –Control phases Active learning phase –Random choices –Maximally patient –Catalog monitors and update control parameters Adaptive learning phase –Must make effort to accomplish mission –Acquiesce and become impatient quickly –Still catalog monitors and update control parameters
31
Learning for Parameter Adjustment –Action Selection Strategy At each iteration, robot r i divides remaining tasks into two categories –Tasks that r i expects to perform better than all others and are not being currently done –All other tasks r i can do Robot r i repeats following until no tasks left to do –Select tasks from the first category, longest first until none left –Select tasks from second category, shortest first
32
Learning for Parameter Adjustment Results - Box Pushing –Experiment 1 2 identical robots 1 fails
33
Learning for Parameter Adjustment –Experiment 2 2 different robots Different capabilities –L-ALLIANCE capable of keeping teams working toward goal Changes to composition Changes to ability
34
Conclusions Lots of challenges left Rewards tantalizing Learning approaches not yet superior to human generated solutions
35
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.