Qjk Qjk.

Slides:



Advertisements
Similar presentations
Reinforcement Learning I: The setting and classical stochastic dynamic programming algorithms Tuomas Sandholm Carnegie Mellon University Computer Science.
Advertisements

Markov Decision Process
Reinforcement Learning
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
1 Black Box and Generalized Algorithms for Planning in Uncertain Domains Thesis Proposal, Dept. of Computer Science, Carnegie Mellon University H. Brendan.
CS 188: Artificial Intelligence
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Markov Decision Processes
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Models of Planning ClassicalContingent (FO)MDP ???Contingent POMDP ???Conformant (NO)MDP Complete Observation Partial None Uncertainty Deterministic Disjunctive.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
Monte-Carlo Tree Search
Search and Planning for Inference and Learning in Computer Vision
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
Reinforcement Learning
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
MDPs (cont) & Reinforcement Learning
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
2/theres-an-algorithm-for-that-algorithmia- helps-you-find-it/ 2/theres-an-algorithm-for-that-algorithmia-
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning Dr. Itamar Arel College of Engineering Department of Electrical.
1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.
CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Markov Decision Processes Chapter 17 Mausam. Planning Agent What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
QUIZ!!  T/F: Optimal policies can be defined from an optimal Value function. TRUE  T/F: “Pick the MEU action first, then follow optimal policy” is optimal.
Using MDP Characteristics to Guide Exploration in Reinforcement Learning Paper: Bohdana Ratich & Doina Precucp Presenter: Michael Simon Some pictures/formulas.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
CSE 573: Artificial Intelligence Reinforcement Learning Dan Weld Many slides adapted from either Alan Fern, Dan Klein, Stuart Russell, Luke Zettlemoyer.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Reinforcement Learning
Reinforcement Learning (1)
AlphaGo with Deep RL Alpha GO.
Reinforcement learning (Chapter 21)
Reinforcement Learning
Announcements Homework 3 due today (grace period through Friday)
 Real-Time Scheduling via Reinforcement Learning
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
 Real-Time Scheduling via Reinforcement Learning
Chapter 9: Planning and Learning
Chapter 4: Dynamic Programming
Reinforcement Learning
Classifier-Based Approximate Policy Iteration
October 20, 2010 Dr. Itamar Arel College of Engineering
Stochastic Methods.
Presentation transcript:

Qjk Qjk

Learning: Uses experience Planning: Use simulated experience

Prioritized Sweeping Priority Queue over s,a pairs whose estimated value would change non-trivially When processed, effect on predecessor pairs is computed and added “All animals are equal, but some are more equal than others.” -George Orwell, Animal Farm

Prioritized Sweeping vs. Dyna-Q

Trajectory sampling: perform backups along simulated trajectories Samples from on-policy distribution May be able to (usefully) ignore large parts of state space

How to backup Full vs. Sample Deep vs. Shallow Heuristic Search

R-Max Questions? What was interesting?

Stochastic games – Matrix: strategic form – Sequence of games from some set of games – More general than MDPs (how map to MDP?) Implicit Exploration vs. Explicit Exploration? – Epsilon greedy, UCB1, optimistic value function initialization – What will a pessimistic initialization do? Assumptions: recognize the state it’s in, and knows the actions/payoffs received Maximin R-Max (1/2)

Either optimal, or efficient learning Mixing time – Smallest value of T after which π guarantees expected payoff of U(π) – ε Approach optimal policy, in time polynomial in T, in 1/ε, and ln(1/δ) – PAC Why K 1 updates needed? R-Max (2/2)

Efficient Reinforcement Learning with Relocatable Action Models RAM Grid-World Experiment Algorithm