Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND.

Slides:



Advertisements
Similar presentations
Application a hybrid controller to a mobile robot J.-S Chiou, K. -Y. Wang,Simulation Modelling Pratice and Theory Vol. 16 pp (2008) Professor:
Advertisements

Markov Game Analysis for Attack and Defense of Power Networks Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao.
Machine Learning in Computer Games Learning in Computer Games By: Marc Ponsen.
The AGILO Autonomous Robot Soccer Team: Computational Principles, Experiences, and Perspectives Michael Beetz, Sebastian Buck, Robert Hanek, Thorsten Schmitt,
 By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions.
ROBOT BEHAVIOUR CONTROL SUCCESSFUL TRIAL OF MARKERLESS MOTION CAPTURE TECHNOLOGY Student E.E. Shelomentsev Group 8Е00 Scientific supervisor Т.V. Alexandrova.
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.
Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.
AI Lab Weekly Seminar By: Buluç Çelik.
An Architecture for Scheduling and Control in Flexible Manufacturing Systems Using Distributed Objects TsuTa Tai and Thomas O. Boucher Presented by: Ammon.
Strategic Decisions Using Dynamic Programming
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
A Scalable Network Resource Allocation Mechanism With Bounded Efficiency Loss IEEE Journal on Selected Areas in Communications, 2006 Johari, R., Tsitsiklis,
RoboCup Soccer‏ Nidhi Goel Course: cs575 Instructor: K. V. Bapa Rao.
THE TITLE OF YOUR PAPER Your Name Communication Networks Laboratory School of Engineering Science Simon Fraser University.
IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University,
RoboCup: The Robot World Cup Initiative Based on Wikipedia and presentations by Mariya Miteva, Kevin Lam, Paul Marlow.
Task decomposition, dynamic role assignment and low-bandwidth communication for real-time strategic teamwork Peter Stone, Manuela Veloso Presented by Radu.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Research on Integrated Obstacle Avoidance of Robot Soccer Game 班 級 : 碩研電機一甲 指導 教授 : 謝銘原 學生 : 洪家信 學號 :MA
Option and Constraint Generation using Work Domain Analysis Presenter: Guliz Tokadli Dr. Karen Feigh.
Interactive Image Segmentation of Non-Contiguous Classes using Particle Competition and Cooperation Fabricio Breve São Paulo State University (UNESP)
1 PSO-based Motion Fuzzy Controller Design for Mobile Robots Master : Juing-Shian Chiou Student : Yu-Chia Hu( 胡育嘉 ) PPT : 100% 製作 International Journal.
Simulation of Robot Soccer Game Kuang-Chyi Lee and Yong-Jia Huang Department of Automation Engineering National Formosa University.
Current Situation and Future Plans Abdelrahman Al-Ogail & Omar Enayet October
Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Leslie Luyt Supervisor: Dr. Karen Bradshaw 2 November 2009.
Introduction Many decision making problems in real life
K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Adviser:高永安 Student:林柏廷
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Introduction to Motion Control
Artificial Immune System based Cooperative Strategies for Robot Soccer Competition International Forum on Strategic Technology, p.p , Oct
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
Algorithmic, Game-theoretic and Logical Foundations
2/8/2005 TEAM: L.A.R.G.E. Slide: 1 LTU AIBO Research Group Alumni Association Support Request Tuesday February 8, 2005.
Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.
Time-Space Trust in Networks Shunan Ma, Jingsha He and Yuqiang Zhang 1 College of Computer Science and Technology 2 School of Software Engineering.
Hybrid Load Forecasting Method With Analysis of Temperature Sensitivities Authors: Kyung-Bin Song, Seong-Kwan Ha, Jung-Wook Park, Dong-Jin Kweon, Kyu-Ho.
A Framework with Behavior-Based Identification and PnP Supporting Architecture for Task Cooperation of Networked Mobile Robots Joo-Hyung Kiml, Yong-Guk.
Paper on “Abduction using Neural Models” for the Course “Intelligent Diagnostics” at UCF. Fall ‘02 Abduction Using Neural Models by Madan Bharadwaj Instructor:
Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.
CPSC 322, Lecture 2Slide 1 Representational Dimensions Computer Science cpsc322, Lecture 2 (Textbook Chpt1) Sept, 7, 2012.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
1 Architecture and Behavioral Model for Future Cognitive Heterogeneous Networks Advisor: Wei-Yeh Chen Student: Long-Chong Hung G. Chen, Y. Zhang, M. Song,
Current research in Intelligence Agents Victor Govindaswamy.
Adaptive Reinforcement Learning Agents in RTS Games Eric Kok.
 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.
Stut 11 Robot Path Planning in Unknown Environments Using Particle Swarm Optimization Leandro dos Santos Coelho and Viviana Cocco Mariani.
RoboCup: The Robot World Cup Initiative
Cognitive Radio Networks
Adviser: Ming-Shyan Wang Student: Feng-Chi Lin
Design of a Multi-Agent System for Distributed Voltage Regulation
A 3.1–10.6 GHz Ultra-Wideband CMOS Low Noise Amplifier With Current-Reused Technique Microwave and Wireless Components Letters, IEEE Volume 17,  Issue.
Multiagent Systems Game Theory © Manfred Huber 2018.
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
Title of Your Paper Names of Co-Authors
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007 Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen, and Ching-Huang Lee Advisor : Ming-Yuan Shieh Student : Ching-Chih Wen S/N : M PPT 製作率︰ 100% 1

 Abstract  Introduction  SYSTEM FORMATION  Basic Behavior  Role Assignment  Strategies  Learning System  Dispatching System  EXPERIMENTS  CONCLUSION OUTLINE 2

 This correspondence presents a multi-strategy decision-making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game.  The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios.  The major task assignment to robots in each strategy is simply to catch good positions.  Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost. ABSTRACT 3

 Reinforcement learning has attracted increasing interest in the fields of machine learning and artificial intelligence recently since it promises a way to use only reward and punishment in achieving a specific task [1]. Fig.1 INTRODUCTION(1/3) 4

 Traditional reinforcement-learning algorithms are often concerned with single-agent problems; however, no agent can act alone since it must interact with other agents in the environment to achieve a specific task [3].  Therefore, we here focus on high-level learning rather than the basic-behavior learning.  The main objective of this correspondence is to develop the reinforcement-learning architecture for multiple coordinate strategies in a robot soccer system. INTRODUCTION(2/3) 5

 In this correspondence, we utilize the robot soccer system as our test platform since this system can fully implement a multi-agent system. Fig.2 INTRODUCTION(3/3) 6

Fig.3 SYSTEM FORMATION 7

 1) Go to a Position  2) Go to a Position With Avoidance  3) Kick a Ball to a Position Fig.4 Fig.5 SYSTEM FORMATION-Basic Behavior 8

 1) Attacker position Fig.6 Fig.7 Fig.8 SYSTEM FORMATION-Role Assignment(1/3) 9

2) Sidekick position Fig.9 Fig.10 3) Backup position 4) Defender position SYSTEM FORMATION-Role Assignment(2/3) 10

5) Goalkeeper position Fig.11 SYSTEM FORMATION-Role Assignment(3/3) 11

 1) Primary part: The attacker’s weighting is.  2) Offensive part: The weighting of sidekick and backup are and, respectively.  3) Defensive part: The weighting of defender and goalkeeper are and, respectively. SYSTEM FORMATION- STRATEGIES(1/2) 12

 According to the different weightings, different strategies can be developed. We can develop three strategies as follows:  1) Normal strategy: is an example used in our simulations.  2) Offensive strategy: is an example used in our simulations.  3) Defensive strategy: is an example used in our simulations. SYSTEM FORMATION- STRATEGIES(2/2) 13

Fig.12 SYSTEM FORMATION- LEARNING SYSTEM(1/3) 14

 1) States: Fig.13  2) Actions :The actions of Q-learning are spontaneous decisions on the strategies taken in each learning cycle. Each action is represented by a set of weights. SYSTEM FORMATION- LEARNING SYSTEM(2/3) 15

 3) Reward Function: — Gain a point: r = 1. — Lose a point: r = −1. —Others: r = 0.  4) Q-Learning:Based on the states, actions, and reward function, we can fully implement the Q-learning method.  Here, the ε-greedy method is chosen as action selection policy, and the probability of exploration ε is 0.1. The learning rate α is 0.8, and the discount factor γ is 0.9. SYSTEM FORMATION- LEARNING SYSTEM(3/3) 16

 First, we introduce the method to compute cost.  Since the cost of each robot reaching each target is known, we can compute the summation costs of all robots to their dispatching positions. SYSTEM FORMATION- DISPATCHING SYSTEM 17

 Multiple Strategy Versus the Benchmark Fig.14 Fig.15 EXPERIMENTS(1/4) 18

 Multiple Strategy Versus Each Fixed Strategy Fig.16 Fig.17 EXPERIMENTS(2/4) 19

 Multiple Strategy Versus Defensive Strategy Fig.18 Fig.19 EXPERIMENTS(3/4) 20

 Multiple Strategy Versus Normal Strategy Fig.20 Fig.21 EXPERIMENTS(4/4) 21

 1) Hierarchical architecture: The system is designed hierarchically, from basic behaviors to strategies. In other vehicle-systems, the basic behaviors can also be utilized.  2) A general learning system platform: If another strategy is designed, it can easily be added into our learning system without much alteration. Through the learning process, we can map the state to the best strategy.  3) Dynamic and quick role assignment: In this system, the role of each robot is changeable. We use the linear programming method to speed up our computation and to find the best dispatch under a strategy. CONCLUSION 22

 [1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania, Philadelphia, Mar Tech. Rep. [Online]. Available:  [2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim., vol. 42, no. 4, pp. 1143–1166,  [3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004, pp. 89–95.  [4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr  [5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep  [6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug  [7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot. Auton. Syst., vol. 35, no. 2, pp. 109–122, May  [8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time self- organizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr  [9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12,  [10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill,  [11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman,  [12] Accessed on 22th of March [Online]. Available: net/soccer/simurosot/overview.htmlhttp://  [13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall,  [14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp. NeuralNetw., 2006, vol. 3971, pp. 599–602. REFERENCES 23