Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007 Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen, and Ching-Huang Lee Advisor : Ming-Yuan Shieh Student : Ching-Chih Wen S/N : M PPT 製作率︰ 100% 1
Abstract Introduction SYSTEM FORMATION Basic Behavior Role Assignment Strategies Learning System Dispatching System EXPERIMENTS CONCLUSION OUTLINE 2
This correspondence presents a multi-strategy decision-making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game. The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios. The major task assignment to robots in each strategy is simply to catch good positions. Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost. ABSTRACT 3
Reinforcement learning has attracted increasing interest in the fields of machine learning and artificial intelligence recently since it promises a way to use only reward and punishment in achieving a specific task [1]. Fig.1 INTRODUCTION(1/3) 4
Traditional reinforcement-learning algorithms are often concerned with single-agent problems; however, no agent can act alone since it must interact with other agents in the environment to achieve a specific task [3]. Therefore, we here focus on high-level learning rather than the basic-behavior learning. The main objective of this correspondence is to develop the reinforcement-learning architecture for multiple coordinate strategies in a robot soccer system. INTRODUCTION(2/3) 5
In this correspondence, we utilize the robot soccer system as our test platform since this system can fully implement a multi-agent system. Fig.2 INTRODUCTION(3/3) 6
Fig.3 SYSTEM FORMATION 7
1) Go to a Position 2) Go to a Position With Avoidance 3) Kick a Ball to a Position Fig.4 Fig.5 SYSTEM FORMATION-Basic Behavior 8
1) Attacker position Fig.6 Fig.7 Fig.8 SYSTEM FORMATION-Role Assignment(1/3) 9
2) Sidekick position Fig.9 Fig.10 3) Backup position 4) Defender position SYSTEM FORMATION-Role Assignment(2/3) 10
5) Goalkeeper position Fig.11 SYSTEM FORMATION-Role Assignment(3/3) 11
1) Primary part: The attacker’s weighting is. 2) Offensive part: The weighting of sidekick and backup are and, respectively. 3) Defensive part: The weighting of defender and goalkeeper are and, respectively. SYSTEM FORMATION- STRATEGIES(1/2) 12
According to the different weightings, different strategies can be developed. We can develop three strategies as follows: 1) Normal strategy: is an example used in our simulations. 2) Offensive strategy: is an example used in our simulations. 3) Defensive strategy: is an example used in our simulations. SYSTEM FORMATION- STRATEGIES(2/2) 13
Fig.12 SYSTEM FORMATION- LEARNING SYSTEM(1/3) 14
1) States: Fig.13 2) Actions :The actions of Q-learning are spontaneous decisions on the strategies taken in each learning cycle. Each action is represented by a set of weights. SYSTEM FORMATION- LEARNING SYSTEM(2/3) 15
3) Reward Function: — Gain a point: r = 1. — Lose a point: r = −1. —Others: r = 0. 4) Q-Learning:Based on the states, actions, and reward function, we can fully implement the Q-learning method. Here, the ε-greedy method is chosen as action selection policy, and the probability of exploration ε is 0.1. The learning rate α is 0.8, and the discount factor γ is 0.9. SYSTEM FORMATION- LEARNING SYSTEM(3/3) 16
First, we introduce the method to compute cost. Since the cost of each robot reaching each target is known, we can compute the summation costs of all robots to their dispatching positions. SYSTEM FORMATION- DISPATCHING SYSTEM 17
Multiple Strategy Versus the Benchmark Fig.14 Fig.15 EXPERIMENTS(1/4) 18
Multiple Strategy Versus Each Fixed Strategy Fig.16 Fig.17 EXPERIMENTS(2/4) 19
Multiple Strategy Versus Defensive Strategy Fig.18 Fig.19 EXPERIMENTS(3/4) 20
Multiple Strategy Versus Normal Strategy Fig.20 Fig.21 EXPERIMENTS(4/4) 21
1) Hierarchical architecture: The system is designed hierarchically, from basic behaviors to strategies. In other vehicle-systems, the basic behaviors can also be utilized. 2) A general learning system platform: If another strategy is designed, it can easily be added into our learning system without much alteration. Through the learning process, we can map the state to the best strategy. 3) Dynamic and quick role assignment: In this system, the role of each robot is changeable. We use the linear programming method to speed up our computation and to find the best dispatch under a strategy. CONCLUSION 22
[1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania, Philadelphia, Mar Tech. Rep. [Online]. Available: [2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim., vol. 42, no. 4, pp. 1143–1166, [3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004, pp. 89–95. [4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr [5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep [6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug [7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot. Auton. Syst., vol. 35, no. 2, pp. 109–122, May [8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time self- organizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr [9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12, [10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill, [11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman, [12] Accessed on 22th of March [Online]. Available: net/soccer/simurosot/overview.htmlhttp:// [13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall, [14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp. NeuralNetw., 2006, vol. 3971, pp. 599–602. REFERENCES 23