Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop Keyan Zahedi, Nihat Ay, Ralf Der (Published on: May.

Slides:



Advertisements
Similar presentations
Lecture 20 Dimitar Stefanov. Microprocessor control of Powered Wheelchairs Flexible control; speed synchronization of both driving wheels, flexible control.
Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence (Fri)
Introduction to VISSIM
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
Motion Analysis (contd.) Slides are from RPI Registration Class.
The City College of New York 1 Jizhong Xiao Department of Electrical Engineering City College of New York Manipulator Control Introduction.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Ant Colonies As Logistic Processes Optimizers
Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Simulation Models as a Research Method Professor Alexander Settles.
The Tornado Model: Uncertainty Model for Continuously Changing Data Byunggu Yu 1, Seon Ho Kim 2, Shayma Alkobaisi 2, Wan Bae 2, Thomas Bailey 3 Department.
Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,
Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,
Neuro-fuzzy Systems Xinbo Gao School of Electronic Engineering Xidian University 2004,10.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots Xavier Pérez *, Cecilio Angulo *, Sergio Escalera + and Diego Pardo *
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
A review of M. Zonoozi, P. Dassanayake, “User Mobility and Characterization of Mobility Patterns”, IEEE J. on Sel. Areas in Comm., vol 15, no. 7, Sept.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
Mobile Robot Navigation Using Fuzzy logic Controller
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Evolutionary conditions for the emergence of communication in robots Dario Floreano, Sara Mitri, Stephane Magnenat, and Laurent Keller Current Biology,
Low Level Control. Control System Components The main components of a control system are The plant, or the process that is being controlled The controller,
Using Polynomial Approximation as Compression and Aggregation Technique in Wireless Sensor Networks Bouabdellah KECHAR Oran University.
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Review: Neural Network Control of Robot Manipulators; Frank L. Lewis; 1996.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 25 –Robotics Thursday –Robotics continued Home Work due next Tuesday –Ch. 13:
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Chapter 4. Formal Tools for the Analysis of Brain-Like Structures and Dynamics (1/2) in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots.
Lecture 5 Neural Control
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
City College of New York 1 John (Jizhong) Xiao Department of Electrical Engineering City College of New York Mobile Robot Control G3300:
Chapter 8: Adaptive Networks
Chapter 2. From Complex Networks to Intelligent Systems in Creating Brain-Like Intelligence, Olaf Sporns Course: Robots Learning from Humans Park, John.
Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.
Chapter 6 Neural Network.
Towards Adaptive Optimal Control of the Scramjet Inlet Nilesh V. Kulkarni Advisors: Prof. Minh Q. Phan Dartmouth College Prof. Robert F. Stengel Princeton.
From NARS to a Thinking Machine Pei Wang Temple University.
Chapter 4. Analysis of Brain-Like Structures and Dynamics (2/2) Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans 09/25.
Robot Intelligence Technology Lab. 10. Complex Hardware Morphologies: Walking Machines Presented by In-Won Park
COMPUTE INVERSE KINEMATICS IN A ROBOTIC ARM BY USING FUZZY LOGIC Subject: Robotics Applications Student: Bui Huy Tien Student ID: M961Y204.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
Does the brain compute confidence estimates about decisions?
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Robot Intelligence Technology Lab. Evolution of simple navigation Chapter 4 of Evolutionary Robotics Jan. 12, 2007 YongDuk Kim.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
OPERATING SYSTEMS CS 3502 Fall 2017
The break signal in climate records: Random walk or random deviations
Intelligent Information System Lab
Dynamical Statistical Shape Priors for Level Set Based Tracking
Dr. Unnikrishnan P.C. Professor, EEE
Ch 14. Active Vision for Goal-Oriented Humanoid Robot Walking (1/2) Creating Brain-Like Intelligence, Sendhoff et al. (eds), Robots Learning from.
"Playing Atari with deep reinforcement learning."
CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano
Presentation transcript:

Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop Keyan Zahedi, Nihat Ay, Ralf Der (Published on: May 19, 2012) Artificial Neural Network Biointelligence Lab School of Computer Science and Engineering Seoul National University Presenter: Sangam Uprety Student ID: October 09, 2012

Contents 1.Abstract 2.Introduction 3.Learning Rule 4.Experiment 5.Results 6.Questions 7.Discussion and Conclusion

1. Abstract A novel learning method in the context of embodied artificial intelligence and self-organization Less assumptions and restrictions within the world and the underlying model Uses the principle of maximizing the predictive information in the sensorimotor loop Evaluated on robot chains of varying length with individually controlled, non-communicating segments Maximizing the predictive information per wheel leads to a higher coordinated behavior Longer chains with less capable controllers outperform those of shorter length and more complex controllers

2. Introduction Embodied artificial intelligence or cognitive systems use learning and adaption rules Most are based on an underlying model – so they are limited to the model They use intrinsically generated reinforcement signals [prediction errors] as an input to a learning algorithm Need of a learning rule independent of model structure, requires less assumptions about the environment Self-organized learning

Our way out: Directly calculate the gradient of the policy as a result of the current locally available approximation of the predictive information A learning rules based on Shannon’s information theory A neural network in which earlier layers maximize the information passed to the next layer

3. Learning Rule 3.1 Basic Sensori-motor loop W0,1,…,t  world state S0,1,…,t  sensor state M0,1,…t  memory state A0,1,…,t  Actions

3. Learning Rule (Contd.) The sensor state St depends only on the current world state Wt. The memory state Mt+1 depends on the last memory state Mt, the previous action At, and the current sensor state St+1. The world state Wt+1 depends on the previous state Wt and on the action At. No connection between the action At and the memory state Mt+1, because we clearly distinguish between inputs and outputs of the memory Mt (which is equivalent to the controller). Any input is given by a sensor state St, and any output is given in form of the action state At. The system may not monitor its outputs At directly, but through a sensor, hence the sensor state St+1.

3.2 Reduced sensori-motor loop Progression from step t to t+1 A, W, S  present states given by distribution µ α(a|s) defines the policy β(w’|w,a)  evolution of world given present world w and action a ϒ(s’|w’)  effect of the world on the sensor state

3.3 Derivation of Learning Rule The Entropy H(X) of a random variable X, measuring the uncertainty, is: The mutual information of two random variables X and Y is: This gives how much knowledge of Y reduces the uncertainly of X. The maximal entropy is the entropy of a uniform distribution: H(X) <= log2|X|.

β(w’|w,a), ϒ(s’|w’)  δ(s’|a,s)

p(s), α(a|s) and δ(s’|a,s) are represented as matrices Update Rule for sensor distribution p(s)

Update Rule for world model δ(s’|a,s)

Update rule for policy α(a|s)

4. Experiment 4.1 Simulators YARS (Zahedi et al, 2008) has been used for the simulator 4.2 Robots Two wheeled differential drive robots with circular body – the Khepera I robot (Mondada et al., 1993)

Input-output  desired wheel velocity (A t ) and current actual velocity (S t ) A t and S t mapped linearly to the interval [-1,1] -1  maximum negative speed (backwards motion) +1  maximal positive speed (forward motion) Robots are connected by a limited hinge joint with a maximal deviation of ±0.9 rad (approx. 100 degree) avoiding intersection of neighboring robots Experiments with single robot, three-, and five-segment chaings

4.3 Controller Each robot controlled locally Two control paradigms: combined and split No communication between controllers Interaction occurs through world state W t through sensor S t  current actual wheel velocity r-c notation r  {1,3,5} c  {r,2r}

4.4 Environment 8x8 meters, bounded, featureless environment Large enough for the chains to learn a coordinated behavior

5. Results If pi increased over time for all six configurations? If the maximization of the pi leads to qualitative changes on the behavior? Videos

5.1 Maximizing the predictive information Fig. Average-PI plots for each of the six experiments: 1-1, 3-3, 5- 5, 1-2, 3-6, 5-10

Comparison of intrinsically calculated PI (left) and PI calculated on recorded data per robot

5.2 Comparing Behaviors Fig. Trajectories of the six systems for the first 10 minutes (gray) and the last 100 minutes (black)

1.All configurations explore the entire area 2.Longer consecutive trails relate to higher average sliding window coverage entropy 3.The configurations which show longer consecutive trails are those, which reach higher coverage entropy sooner  Movements only occur for chains with length larger than one if the majority of the segments moves in one direction  Cooperation of the segments  Higher cooperation among the segments of the split configuration  Higher pi relates to higher coverage entropy and higher sliding window coverage entropy, for the split controller paradigm

4.3 Behavior Analysis Chosen bins: -3/4, -1/2, 1/2, 3/4 With Configuration 1-2

Transient plot  wheel velocities oscillates between -1/2 and - 3/4 S=-1/2  A  {-1/2, 1/2, 3/4}  S=-1/2 A=-3/4 chosen with probability 0.95 With probability 0.05, change of direction of velocity occurs, leading to either rotation of the system, or inversion of the translational behavior  Sensor entropy H(S) is high, conditional entropy H(S’|S) is low, hence high PI

With configuration 3-6

Wheel velocity of one wheel is no longer only influenced by its controller, but also by the actions of the other controllers Current direction of the wheel rotation is maintained with the probability 0.6 For the entire system to progress, at least two robots [i.e four related controllers] must move in the same direction  probability 0.4 4

4.4 Incremental Optimization The derived learning rule is able to maximize the predictive information for systems in the sensorimotor loop Increase of the PI relate to changes in the behavior and here to a higher coverage entropy – and indirect measure for coordination among the coupled robots

6. Questions Q.1 Explain the concept of the perception-action-cycle in fig. 1. What are the essential characteristics of this concept? How is this concept distinguished from traditional symboloc AI approach? Q.2 Explain the simplified version of the perception-action-cycle in fig. 2. What are their differences from the full version of figg. 1? How reasonable is this simplification? When it will work and when it does not? Q.3 Define mutual information. Define the predictive information. Give a learning rule that maximizes the predictive information. Derive the learning rules. Q.4 Explain the experimental tasks that are designed by the authors to evaluate the learning rule for predictive information maximization. What’s the setup? What is the task? What has been measured in simulation experiments? Summarize the results. What’s the conclusion of the experiments?

7. Discussion & Conclusion A novel approach to self-organized learning in the sensorimotor loop, which is free of assumptions on the world and restrictions on the model Learning algorithm derived from the principle of maximizing the predictive information The average approximated predictive information increased over time in each of the settings in the experiment [Goal #1 achieved] There is a higher coverage entropy, a measure for coordinated behavior, for chain configurations with more robots (and well with split controllers) [counterintutive!]

Thank you!