Intrinsically Motivated Collective Motion

Intrinsically Motivated Collective Motion
Henry Charlesworth Supervisor: Professor Matthew Turner Introduction Model It has been suggested that in many situations that a sensible principle to follow could be to make decisions so as to maximize the number of choices that are available to you in the future, i.e. to keep your options open as much as possible. This is an example of an intrinsic motivation for behaviour. It offers an incentive to act even without a specific task to complete or an immediate external reward to gain. The idea here is that this kind of behaviour does not help in the solution of any one particular problem but can be beneficial for a wide range of possible scenarios. One attempt to formalize this idea is the “empowerment” framework. Defined in the language of information theory this is essentially a measure of how much information an agent could potentially inject into its environment and then itself detect with its own sensors at a later time. This provides a way of quantifying how much influence or control an agent has over its future states. For discrete sets of actions/sensor states and a deterministic environment (where each action leads to a perfectly predictable outcome) it simply reduces to the logarithm of the number of unique sensor states which can be accessed at some fixed time into the future. Our idea was to apply this to a group of agents equipped with simple visual sensors and study the resultant motion of the group. Consider a group of agents of finite size. Each has a number of visual sensors that detect the angular projection of the other agents in the “flock” and register a 1 if they are more than half full or 0 otherwise. i.e. a visual state is taken to be a vector of 0s and 1s. Choose the currently available action that leads to the largest number of unique visual states in the future. Future branches where collisions occur do not contribute to this count. At each time step the agents can choose from one of five possible actions, including reorientations and speed changes. Essentially agents are moving so as to maximize the control they have over the potential visual states that are accessible at some number of time steps τ in the future. When modelling future trajectories, each agent assumes that the others will continue to move in a straight line at v0. Obviously this is not true in general but just the simplest assumption that can be made. It turns out that the resultant flock is highly ordered and travels with 𝑣 ≈ 𝑣 0 so there is some level of self consistency here anyway. Results Definition of the order parameter φ: φ = 1 𝑁 𝑖 𝒗 𝒊 (Measurement of how aligned the flock is.) Average Opacity is the average fraction of an agent’s visual field which is filled. 𝑢 𝑖 = 𝑣 𝑖 − 𝑣 𝐶 𝑟 = 𝑢 𝑖 (0). 𝑢 𝑗 ( 𝑟 ) Primary result is that this algorithm leads to a robust, highly ordered flock with sensibly regulated density and marginal opacity, i.e. the average opacity of the individuals is close to ½. All of these are features associated with real flocks of starlings. This behaviour is remarkably robust over variations in the model parameters. Looking at the correlations in the fluctuations of the velocity around the flock mean we find that the “correlation length” (i.e. the distance at which the correlation function decays to zero) scales linearly with the size of the flock. This scale free behaviour is also something which has been observed in real flocks of starlings. Conclusions Learning a Heuristic Although this algorithm produces interesting collective motion with lots of features associated with real flocks of birds it is quite complicated. Each decision requires modelling a large number of future trajectories and so is certainly not similar to any kind of calculations that a real organism would be making. We wanted to see whether it would be possible to use this model to train a heuristic that can mimic this behaviour but only using the information which is currently available to the agent. We did this training a neural network to try and learn to classify the “correct” moves that are made in a particular situation by the full empowerment maximizing algorithm. That is, we provide as an input to the network the current (and previous) visual state vectors and then provide an output as an integer between 1 and 5 representing which move was made by the full algorithm. Doing this we are able to train a neural network that produces behaviour which is qualitatively and quantitatively very similar. Model based on agent’s moving so as to maximize the control they have over their environment produces robust and highly ordered collective motion. Many of the features that fall out are associated with real flocks of starlings, including marginal opacity and scale free correlations. Have been able to train a neural network that can nicely reproduce similar behaviour but without having to carry out complicated calculations and model many possible future trajectories, demonstrating that learning heuristics which mimic this “empowerment maximizing” behaviour can be possible. As such perhaps this could be a useful principle for understanding a range of real animal behaviours. References “Empowerment: A universal agent-centric measure of control”, Klyubin A, Polani D, Nehaniv C (2005) “Guided Self-Organization: Inception”, Propenko M (2014) “Scale-free Correlations in starling flocks”, Cavagna et al. (2010)

Intrinsically Motivated Collective Motion

Similar presentations

Presentation on theme: "Intrinsically Motivated Collective Motion"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intrinsically Motivated Collective Motion

Similar presentations

Presentation on theme: "Intrinsically Motivated Collective Motion"— Presentation transcript:

Similar presentations

About project

Feedback