Learning to Communicate

Learning to Communicate
Curt Bererton Main Paper: An Adaptive Communication Protocol for Cooperating Mobile Robots, Yanco and Stein, 1993 Supplementary Papers: Holly Yanco’s Master’s Thesis, 1994 Grounded Symbolic Communication between Heterogeneous Cooperating Robots, Jung and Zelinsky, 1999 09/28/1999

Main Idea and Contributions
One of the seminal papers in this area Mobile robots engaged in a cooperative task Start with a fixed but uninterpreted vocabulary for communication Objective is to learn a private communication language between robots 09/28/1999

Task: coordinated movement, go straight or spin 09/28/1999

Want to give autonomous agents the ability to develop their own language Learning Tasks: The Leader must interpret the environmentally supplied signal, execute the appropriate action, and transmit an appropriate signal to the followers so that they all do the right thing The followers need only execute the right action based on the signal from the leader Very simple learning algorithm... 09/28/1999

Very Simple Learning algorithm A table with inputs along one side and actions along the other Expected best action is taken, if positive reinforcement is received a counter is incremented If there isn’t a “best” action one is chosen at random 09/28/1999

One of the main ideas is that of “Task Based Reinforcement” No individual reinforcement, group is reinforced as a whole 09/28/1999

Assumptions There is only a single leader and multiple followers
Behaviors are pre-defined on all robots Followers don’t have access to task specification or environmental cues Coordinated movement is indicative of learning communication for other tasks We don’t know how to do it for a group of homogeneous robots. Will this scale well? The behaviors being pre-defined mean that you can only learn to tell the robot which behaviors to use from its existing vocabulary… this means that you already need to know what those behaviors are… big assumption… This isn’t all that realistic… typical the followers will also have sensors that will allow them some external knowledge of what the right thing to do is. This of course will make the learning easier… how much easier? Every researcher makes these types of assumptions. One wishes to look at a task that will allow the results to be applied to other tasks. 09/28/1999

Assumptions That the interval estimation algorithm is representative of the number of training examples required to learn a language Communication noise is dwarfed by human error Reinforcement immediately follows the action taken Human is available to provide reinforcement Obviously this isn’t the case This may or may not be the case in real life… especially if you have another autonomous agent telling them what to do… the system should be able to handle the case where there is noise in the communication system, because there are many real life situations in which the noise can be an important factor. This also is not realistic… they do mention the fact that in more complex tasks the reinforcement might only come after a sequence of actions 09/28/1999

Results Takes on average iterations to converge for two robots, two element language Use of a biased random function improved these results - OUCH!!! That’s pitiful - So they cheated and used a biased random function to ensure that the training data didn’t look so pathetic… not to mention the fact that simulations that took that many iterations would take forever on the computing available at the time. 09/28/1999

Results Learning times for a two member team experiments per size of language. This is actually really bad. 09/28/1999

Results For 3 member team, 100 experiments per language size
Size of 20 took 12 million iterations 09/28/1999

Results Convergence time increases exponentially in the number of possible actions and signals Poor convergence time is due to learning algorithm To learn a new language after one has been taught requires up to twice as long Doesn’t prefer one meaning over another for a language - - I.e. sometimes one signal means go straight, and the next time it might mean spin. The implication here is that the language can indeed adapt to new situations. 09/28/1999

Highlights The ability to learn a language to accomplish a task
Can adapt to changing situations, change the language to suit a new task Previous work had largely assumed a fixed communication language Idea of task-based reinforcement Task based reinforcement just means that the reward is not on an individual basis… this is a good idea for learning a communication language, new idea. 09/28/1999

Limitations Only looks at the one leader many follower case
The choice of coordinated movement as a task domain Followers don’t have any access to environmental cues, must rely solely on the signals emitted by the leader - Perhaps this will not be applicable to a group of homogeneous robots trying to learn to communicate to cooperate together. - Will these results be applicable to the far more complex tasks such as soccer or foraging? Probably not… the language might be far more complex - Clearly this is a big limitation… in any real situation, the followers almost always have some kind of sensors, and might be able to guess what the leader wants given what they know of the environment. This is also an issue with choosing coordinated movement as the task. 09/28/1999

Limitations The number of signals is equal to the size of the language
Relies on a human trainer for real task Ignores noise in communication Only looks at very small numbers of robots Use of the biased random function to obtain results - I.e. only two signals, only two actions. Clearly this means that you can’t have one don’t have the capability to have one signal represent a series of actions. I.e. “Attack” might mean go to A, pick up gun, move to B, fire -Should be automated -In any real task there is always noise in communication - - Clearly this makes the results very poor… It all comes back to the fact that their learning algorithm is pretty pitiful… so they need to use these biased random functions 09/28/1999

Limitations Huge convergence times Task-based reinforcement
Doesn’t use the one-to-one correspondence between signals and actions Doesn’t scale to more complex tasks involving a series of actions before a reward is received - - The task based reinforcement idea is a bit limiting… one could imagine a case where instead of having both go from nothing, that one already has some notion of what a given signal means, and thus bootstraps the others - Obviously they could speed up the learning process if they used the fact that each signal only relates to one action - If reinforcement is received only after a sequence of actions the current scheme won’t work 09/28/1999

Limitations The interval estimation learning algorithm is primitive
Does the exponential growth in training time occur with a better algorithm? Clearly most of the results in this paper come from the fact that their learning algorithm was so pathetic… any sort of Q-learning algorithm would perform much better, learn faster and be applicable to many more cases 09/28/1999

Comparison, Present Work
Artificial life approach to developing communication Bootstrapping Human language Robot developed language Context dependent learning Grammar makes the task simpler -Some people are trying to simulate evolution, and see how communication evolves. They try to have a fitness function that prefers genes which communicate - Human developed language is complex… but they give the robot the ability to adapt - or start with a language developed by other robots, thus not much adaptation should be necessary by another robot ( in theory ). - As one would expect… if the robot has information about the world state that assists in determining the meaning of the signal learning is improved - If a simple grammar is introduced then one can learn more complex sequences of behavior from simple patterns 09/28/1999

Comparison The use of reinforcement learning
Memory window for faster re-learning Problems with convergence in three robot examples -Relearning means that the symbol that used to mean something else is now supposed to be interpreted differently than before. Alternatively the weights for the reinforcement learning can be changed so that the more recent feedback has more importance. - only 100 of 545 runs converged to the correct behavior… clearly no good. 09/28/1999

Comparison: Related Work
Jung and Zelinsky’s paper Shared grounding of symbols Relation to physically based AI Physically grounded symbols e.g. encoder signals and known locations Shared grounding of symbols: Means that in order to have a communication language that is as broad as possible the symbols passed between the transmitter and emitter should have a shared grounding Great idea: for example to get across the meaning greater. Blind person heavier or louder will work, whereas for a seeing person “further away” is a good physically based symbol In their task they had two robots in a room… one go around the edge collecting foam things and pushing them away from the edge, and the other picks them up. One just goes around edge only knows locations around the edge, but the other can tell it when it is at a common location… this is the location labeling procedure 09/28/1999

Comparison: Related Work
Four types of communication layers implicit awareness of other robots explicit signalling grounded symbol signalling Results: significant performance increase with the addition of each layer Implicit: where one robot simply goes around making piles and the other one picks them up if it sees them. The communication in this case is implicit. Awareness: one robot can see the other robot, and thus one uses the position of the other as a likely place where a pile might be… this is still implicit communication Explicit: Flo tells Joh when a pile is dropped, and its relation to the last piles dropped, only useful when Joh can see Flo grounded: uses the labelled locations, and encoder signals to transmit the information 09/28/1999

Learning to Communicate

Similar presentations

Presentation on theme: "Learning to Communicate"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to Communicate

Similar presentations

Presentation on theme: "Learning to Communicate"— Presentation transcript:

Similar presentations

About project

Feedback