 By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions.

 By Ashwinkumar Ganesan CMSC 601

 Reinforcement Learning  Problem Statement  Proposed Method  Conclusions

 Method for learning by experience  There agent or bot learns by interacting with the environment.  There is reward attached for action taken in a particular state.  GOAL : MAXIMIZE THE REWARD

 Co-ordinated Group Action is the situation a set of agents perform a single task.  GOAL: To maximize the output or the reward globally.

 Communication i.e. what should the agent communicate and how much should it communicate with other agents.  Optimal Policy i.e. defining an optimal policy of the entire group. Is an optimal policy a set of optimal individual policies for each agent?  How much of the individual policy information of a certain agent is available to the entire group.

 Create a method for implementing reinforcement learning on co-ordinated group activity efficiently  Modify Reinforcement Learning algorithm to implement group action  Implement the proposed method and measure its efficiency in World of Warcraft.

 The proposal is to research co-operative group learning under the following conditions: 1. The environment is assumed to partially observable. 2. Each agent in the system is knows the final action taken by the other agent. 3. Agents do not have access the to state information generated by other agents while selecting an action. 4. Agents do not have access to the policies of other agents, to make the decision. 5. The rewards are not linearly separable.

 World of Warcraft is a large multi-player online game by Blizzard Entertainment  It is game where every player has his own and roams the virtual world, fighting demons, observing the landscape, buy and selling items and interacting with other players  In short, it is a large game with lots of options for tools, skills and levels, making it challenging for Bots.

http://world-of-warcraft.en.softonic.com/

 Today games especially like World Of Warcraft, have a large number of human players who play in groups.  Single as well as multiplayer games have AI engines, but attacks and actions in these games by opponents is still one at a time.  Reinforcement Learning can be implemented real time in these games, to improve the AI over a period of time and customize the games for users  Other applications – Robotics,defense

 QUICR Method: This method calculates the counterfactual action, which is the action an agent did not take at a time[2].  Least Square Policy Iteration (LSPI) can implemented. The method performs policy iterations using samples instead of an actual policy[3].  FMQ Algorithm. The algorithm is helpful for environments where agents have partial or no observability[4].

 Inverse reinforcement learning is the exact opposite of reinforcement learning.  Input is the optimal policy or behavior that is expected from the agent  The agent learns to find the reward function based on observed values in the environment

 Implement Reinforcement Learning in 2 parts: 1. Implement Inverse Reinforcement Learning 2. Implement a modified Reinforcement Learning on the rewards learned in step1. Observe Expert Calculate Reward Calculate Policy based on reward Calculate the new Q(a,s) value Observe other agents

 Calculating the iteration when the agent is known to have optimal policy  Finding the point when to switch from Inverse Learning to Reinforcement Learning  Working with the reward function got from observing the expert and rewards obtained from the environment  Finding a method to observe other agents and “experts”.

 The metrics will be evaluated against the known methods  Metrics are: 1. Number of states generated 2. Number of iterations required to reach optimal policy 3. Rewards in terms of points earned (in the game) 4. Rate of Convergence to optimal Policy

 We can use Inverse reinforcement learning with reinforcement learning methods to speed up the learning time required for bots.  Improve bot reward functions over time.

1. R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952 2. Adrian K. Agogino and Kagan Turner, QUICKER Q-Learning in Multi-agent systems. 3. Lihong Li, Micheal L. Littman, Christopher R. Mansley, Online Exploration in Least – Squares policy Iteration. 4. La¨etitia Matignon, Guillaume J. Laurent and Nadine Le Fort- Piat, A study of FMQ heuristic in cooperative multi-agent games 5. Acknowledgement to Prof. Tim Oates for helping with the literature survey.

 By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions.

Similar presentations

Presentation on theme: " By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions.

Similar presentations

Presentation on theme: " By Ashwinkumar Ganesan CMSC 601.  Reinforcement Learning  Problem Statement  Proposed Method  Conclusions."— Presentation transcript:

Similar presentations

About project

Feedback