Presentation By SANJOG BHATTA Student ID : 20091143 July 28’ 2009.

Slides:



Advertisements
Similar presentations
14 de Fevereiro de 2004, Instituto Sistemas e Robótica Emotion-Based Decision and Learning Bruno Damas.
Advertisements

Reinforcement Learning
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement Learning
Reinforcement learning (Chapter 21)
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Reinforcement learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina.
Optimal Tuning of Continual Online Exploration in Reinforcement Learning Youssef Achbany, Francois Fouss, Luh Yen, Alain Pirotte & Marco Saerens Information.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Soar-RL: Reinforcement Learning and Soar Shelley Nason.
Reinforcement Learning (1)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Stochastic Routing Routing Area Meeting IETF 82 (Taipei) Nov.15, 2011.
Reinforcement Learning
Balancing Exploration and Exploitation Ratio in Reinforcement Learning Ozkan Ozcan (1stLT/ TuAF)
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHAPTER 10 Reinforcement Learning Utility Theory.
Neural Networks Chapter 7
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
INTRODUCTION TO Machine Learning
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
1 Introduction to Reinforcement Learning Freek Stulp.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Distributed Q Learning Lars Blackmore and Steve Block.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning Dr. Itamar Arel College of Engineering Department of Electrical.
Reinforcement learning (Chapter 21)
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Figure 5: Change in Blackjack Posterior Distributions over Time.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10
Reinforcement learning (Chapter 21)
István Szita & András Lőrincz
Reinforcement learning (Chapter 21)
Reinforcement Learning
Announcements Homework 3 due today (grace period through Friday)
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
یادگیری تقویتی Reinforcement Learning
Reinforcement Learning
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
Emir Zeylan Stylianos Filippou
CS 188: Artificial Intelligence Spring 2006
October 20, 2010 Dr. Itamar Arel College of Engineering
Chapter 2: Evaluative Feedback
Markov Decision Processes
Engineering Design Process
Markov Decision Processes
Reinforcement Learning
Presentation transcript:

Presentation By SANJOG BHATTA Student ID : July 28’ 2009

Background Challenges in Reinforcement Learning Issue of Primary Importance and Much Researched Crucial in Dynamic Environment Tend to learn slowly Tradeoff between Exploration and Exploitation Analogous to tradeoff between system control and system identification in Optimal Control

Question Do we try new actions to find out if they have a good reward? Do what looks best, or see if something else is really best. Or Do we just keep to the actions we have already learnt to have good rewards? What action(s) are responsible for a reward? Requires solving credit assignment

To maximize expected total reward the agent must prefer actions that it has tried in the past and found to be effective. To discover – it has to take actions that has not been taken in the past. A good balance between an exhaustive exploration of the environment and the exploitation of the learned policy is fundamental to reach nearly optimal solutions in few learning episodes, thus enhancing the learning performance.

In Dynamic Environment Two key issues Dealing with moving obstacles Dealing with terrain that changes over time The problem becomes more challenging because Currently exploited solution may no longer be valid Solutions previously explored may have changed in value

Approaches Incorporation of a Forgetting Mechanism into Q- Learning Feature Based Reinforcement Learning Hierarchical Reinforcement Learning

Forgetting Mechanism Decaying forgetting term Removing over-dependence on a specific set of solution (CGA) Exploration emphasized more than Exploitation (RL) Avoids making use of outdated knowledge 3 Concepts integrated Penalty based value function Action Selection Policy Forgetting Mechanism

Penalty Based Value Function Value function maintained over set of states rather than set of state action pairs Adaptation of Q-Learning to an environment where the resultant states of a state-action pair are deterministic rather than probabilistic Necessary to store the values of individual states and this is maintained as a penalty function, which tracks the expected total cost associated with being in the given state

As agent explores, learns the associated penalty approximated by the value function for that state. Value function for the visited state is

Action Selection Policy Selecting an action a that minimizes the penalty Greedy Policy – k chosen such that value of P(S,k) is minimized

Forgetting Mechanism Slow decay of state value function Enhance exploration Maintain a diversity of possible solutions Forgetting penalty associated with a state previously determined to be suboptimal – allows to explore states that would otherwise be ignored Applied to value function after each episode