Multi Robot Systems Course Bar Ilan University Mor Vered.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts.

MULTI-ROBOT SYSTEMS Maria Gini (work with Elizabeth Jensen, Julio Godoy, Ernesto Nunes, abd James Parker,) Department of Computer Science and Engineering.

Lecture 4: Command and Behavior Fusion Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

The AGILO Autonomous Robot Soccer Team: Computational Principles, Experiences, and Perspectives Michael Beetz, Sebastian Buck, Robert Hanek, Thorsten Schmitt,

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.

Reinforcement Learning

1. Algorithms for Inverse Reinforcement Learning 2

Best-First Search: Agendas

Intelligent Patrolling Sarit Kraus Department of Computer Science Bar-Ilan University Collaborators: Noa Agmon, Gal Kaminka, Efrat Sless 1.

COMMUNICATION IN MULTIROBOT TEAMS - BARATH CHRISTOPHER PETIT.

Planning under Uncertainty

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement Learning

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Reinforcement Learning

Optimal Tuning of Continual Online Exploration in Reinforcement Learning Youssef Achbany, Francois Fouss, Luh Yen, Alain Pirotte & Marco Saerens Information.

Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.

MAE 552 – Heuristic Optimization

Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.

Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.

Reinforcement Learning (1)

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.

Reinforcement Learning

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Multiple Robot Systems: Task Distribution, Coordination and Localization Sameer Singh 83 ECE 2000 Final Year NSIT.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.

Artificial Intelligence in Game Design Complex Steering Behaviors and Combining Behaviors.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets Lynne E. Parker Autonomous Robots, 2002 Yousuf Ahmad Distributed Information.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

MDPs (cont) & Reinforcement Learning

Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.

Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

Intelligent Agents: Technology and Applications Unit Five: Collaboration and Task Allocation IST 597B Spring 2003 John Yen.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

CS b659: Intelligent Robotics

Conception de modèles pour la simulation

System analysis and design

Timothy Boger and Mike Korostelev

CSCI1600: Embedded and Real Time Software

Navigation In Dynamic Environment

Announcements Homework 3 due today (grace period through Friday)

Market-based Dynamic Task Allocation in Mobile Surveillance Systems

Spectrum Sharing in Cognitive Radio Networks

CS 188: Artificial Intelligence Spring 2006

Introduction to Reinforcement Learning and Q-Learning

Lecture 3: Environs and Algorithms

Chapter 4 . Trajectory planning and Inverse kinematics

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Multi Robot Systems Course Bar Ilan University Mor Vered

Motivation In a multi-robot environment path planning or collision avoidance is an important problem. Multi-robot systems researchers have been investigating distributed coordination methods for improving spatial coordination in teams – a.k.a collision avoidance. Such methods adapt the coordination method to the dynamic changes in density of the robots. Basically, the goal is to help avoid collision from any static obstacles or other dynamic objects, such as moving robots.

Motivation Spatial conﬂicts can cause the team’s productivity to drop with the addition of robots. We will see that this phenomenon is impacted by the coordination methods used by the team-members, as different coordination methods yield radically different productivity results.

Heuristic Coordination Approaches Noise method - If I am on a given trajectory that is danger of colliding with another agents', add random noise to my direction vector. ”Behavior Based Formation Control for Multi-robot Teams”. Balch, Tucker and Arkin, Ronald C

Heuristic Coordination Approaches Aggression method - Describes a controller which breaks deadlocks in favour of the most ‘aggressive’ robot. The robots compete and only one gains access to the resource. When robots come to close to each other, each of the robots chooses an aggression level (randomly); the robot with the lower level concedes its position, preventing a collision. Later showed that it might be best to choose aggression level proportional to the robot's task. For every cycle a robot found itself within 2 radii of a teammate, it selected either an aggressive or timid behavior, with probability of 0.5. If the robot selected to become timid, it backed away for 100 cycles (10 simulated seconds). Otherwise it proceeded forward, executing the aggressive behavior. As robots chose to continue being “aggressive” or to become “timid” every cycle, the probability that two robots would collide in this implementation was near zero. ”Go ahead, Make my day: Robot Conflict Resolution by Aggressive Competition”. Vaughan, Richard and St{\o}y, Kasper and Sukhatme, Gaurav and Mataric, Maja

Heuristic Coordination Approaches Repel method - When sensing a possible collision on course the Repel group backtracked for 500 cycles (50 seconds) but mutually repelled using a direction of 180 degrees away from the closest robot.

Heuristic Coordination Approaches TimeRand method - This method contained no repulsion methods to directly avoid collision but in case of a collision could take care of the immobile robot. When robots sensed that they did not signiﬁcantly move for 100 cycles (10 seconds), they proceeded to move with a random walk for 150 cycles (15 seconds).

Heuristic Coordination Approaches TimeRepel method - This method also contained no repulsion methods to directly avoid collision but only reacted after the fact to collisions. Once these robots did not move for 150 cycles (15 seconds), they then moved backwards for 50 cycles (5 seconds). All methods taken from “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn

Heuristic Coordination Approaches Homework Come up with your own heuristic coordination approach by next week’s lesson and it to me.

Selecting the Best Approach Assuming we have come up with several solutions. Another problem arising from that is how to select the best coordination method. As we stated before spatial conﬂicts can cause the team’s productivity to drop with the addition of robots. This phenomenon is impacted by the coordination methods used by the team-members, as different coordination methods yield radically different productivity results. No one collision avoidance method is best in all domain and group size settings. The effectiveness of coordination methods in a given context is not known in advance.

Selecting the Best Approach

CCC - Combined Coordination Cost Method How to select the best coordination method CCC -quantiﬁes the production resources spent on coordination conﬂicts ; quantify the cost of group interactions. Multi-attribute cost measure to quantify resources such as time and fuel each group member spends in coordination behaviors during task execution. Facilitates comparison between different group methods. “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn

CCC - Combined Coordination Cost Method Contended that if robots dynamically reduce their CCC, group productivity will be improved. To demonstrate this, they created robotic groups which dynamically adapt their coordination techniques based on each robot’s CCC estimate. Problem with this method – it ignores the gains accumulated from long periods of no coordination needs. The next method tries to fix that “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn

Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective Used a reinforcement-learning approach to coordination algorithm selection. Proved it both on experiments (foraging) and empirically by mathematical equations. “ Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective “ Gal A. Kaminka, Dan Erusalimchik and Sarit Kraus

Problem Definition The normal routine of a robot's operation is to carry out its primary task until interrupted by a conflict with another robot which must be resolved by a coordination algorithm. This is called a conflict event. The event triggers a coordination algorithm to handle the conflict. Once it successfully finishes, the robots involved go back to their primary task.

Problem Definition Defined several kinds of tasks : 1) Loose-coordination between the robots ( only occasional need for spatial or temporal coordination ). For example; multi-robot foraging. 2) Cooperative task – the robots seek to maximize group utility. For example; exploration. 3) Timed tasks – the task is bound in time. For example; exploration – completely explore a new area as quickly as possible, or patrolling.

Problem Definition Divided time into : 1) Active interval – in which the robot was actively investing resources in coordination. 2) Passive interval – in which the robot no longer requires investing in coordination. 3) The robot has a nonempty set of coordination algorithms to select from. The choice of c coordination algorithm selection effects the duration of the active and passive intervals.

Reinforcement Learning Computes how an agent ought to take actions in an environment so as to maximize some definition of cumulative reward. Basic reinforcement model consists of : 1) A set of environment states - S. 2) A set of actions - A. 3) Rules of transitioning between states. 4) Rules that determine the immediate reward of a transition. 5) Rules that describe what the agent observes.

Reinforcement Learning The trick is - how to define the reward. They introduced a reward function EI effectiveness index, that reduces time and resources spent coordinating and maximizes the time between conflicts that require coordination. Took into consideration time spent on coordination and time not spent on coordination.

Reinforcement Learning Instead of reward functions – used Q-Learning. ( a variant of reinforcement learning ). A technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. Takes into account an accumulative value - of all results taken up until now. “Q-learning”, Watkins, Christopher JCH and Dayan, Peter

General reward The general reward is comprised of : 1) The total cost of coordination – cost of internal resources such as battery life and fuel. 2) Time spent on coordinating. 3) Frequency of coordinating.