Efficiency in ML / AI 1.Data efficiency (rate of learning) 2.Computational efficiency (memory, computation, communication) 3.Researcher efficiency (autonomy,

Slides:

Advertisements

Similar presentations

Python Programming, 2/e1. Most useful Presentation of algorithms Requiring us to read & respond (2) Discussion (2) or in-class exercises Examples (x2)

Advertisements

Lecture 18: Temporal-Difference Learning

Programming exercises: Angel – lms.wsu.edu – Submit via zip or tar – Write-up, Results, Code Doodle: class presentations Student Responses First visit.

EKF, UKF TexPoint fonts used in EMF.

CS590M 2008 Fall: Paper Presentation

Batch RL Via Least Squares Policy Iteration

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

The loss function, the normal equation,

Infinite Horizon Problems

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction.

Heuristic alignment algorithms and cost matrices

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Reinforcement Learning

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.

1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Expectation Maximization Algorithm

Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.

CS 4700: Foundations of Artificial Intelligence

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

RL for Large State Spaces: Policy Gradient

Neural Networks How do we get around only linear solutions?

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.

CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Reinforcement Learning Eligibility Traces 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Schedule for presentations. 6.1: Chris? – The agent is driving home from work from a new work location, but enters the freeway from the same point. Thus,

Off-Policy Temporal-Difference Learning with Function Approximation Doina Precup McGill University Rich Sutton Sanjoy Dasgupta AT&T Labs.

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Monte Carlo Methods. Learn from complete sample returns – Only defined for episodic tasks Can work in different settings – On-line: no model necessary.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.

Nonlinear State Estimation

NTU & MSRA Ming-Feng Tsai

CHAPTER 11 R EINFORCEMENT L EARNING VIA T EMPORAL D IFFERENCES Organization of chapter in ISSO –Introduction –Delayed reinforcement –Basic temporal difference.

Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC

Reinforcement Learning

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.

TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Stochastic tree search and stochastic games

Reinforcement Learning (1)

Spatial Online Sampling and Aggregation

Hierarchical POMDP Solutions

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Reinforcement Learning

October 6, 2011 Dr. Itamar Arel College of Engineering

Lecture 15: Data Cleaning for ML

Chapter 9: Planning and Learning

The loss function, the normal equation,

Chapter 7: Eligibility Traces

CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES

Presentation transcript:

Efficiency in ML / AI 1.Data efficiency (rate of learning) 2.Computational efficiency (memory, computation, communication) 3.Researcher efficiency (autonomy, ease of setup, parameter tuning, priors, labels, expertise)

Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, Cs., Wiewiora, E. Fast gradient-descent methods for temporal-difference learning with linear function approximation. ICML-09. Sutton, Szepesvari and Maei (2009) recently introduced the first temporal- difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. We introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD.

van Seijen, H., Sutton, R. S. True online TD(λ) ICML-14 TD(λ) based on equivalence to a clear and conceptually simple forward view, and the fact that it can be implemented online in an inexpensive manner. Equivalence between TD(λ) and the forward view is exact only for the off- line version of the algorithm (in which updates are made only at the end of each episode). In the online version of TD(λ) (in which updates are made at each step, which generally performs better and is always used in applications) the match to the forward view is only approximate. In this paper we introduce a new forward view that takes into account the possibility of changing estimates and a new variant of TD(λ) that exactly achieves it. In our empirical comparisons, our algorithm outperformed TD(λ) in all of its variations. It seems, by adhering more truly to the original goal of TD(λ)—matching an intuitively clear forward view even in the online case—that we have found a new algorithm that simply improves on classical TD(λ).

1.What’s been most useful to you (and why)? 2.What’s been least useful (and why)? 3.What could students do to improve the class? 4.What could Matt do to improve the class?

Exercises

Questions Why are we sampling the environment randomly though? Wouldn't a depth first search from the current state be better? Doesn't make any sense to worry about an action from a state we can't get to. I'm curious about the statement in 9.7 Heuristic Search that this process computes backed-up values of the possible actions, but does not attempt to save them. When they talk about Full backups, that seems like it would take up a prohibitively large data structure. Does that require a finite amount of possible state/action pairs then? Would it possibly run slowly as the backups get large?

Learning: Uses experience Planning: Use simulated experience

Models A model lets the agent predict how the environment responds to its actions Distribution model: description of all possibilities and their probabilities – p(s’, r | s, a) for all s, a, s’, r Sample model, aka simulation model – Produces sample experiences for given s,a – Allows reset / exploring starts – Often easier to obtain