UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

Reinforcement learning
Programming exercises: Angel – lms.wsu.edu – Submit via zip or tar – Write-up, Results, Code Doodle: class presentations Student Responses First visit.
Markov Decision Process
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Reinforcement Learning
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Collaborative Filtering Matrix Factorization Approach
UAIG: First Fall 2013 Meeting. Agenda  Introductory Icebreaker  Fall Meetings Info  How to get Involved with UAIG?  Discussion: The AI Problem  Free.
UAIG: Third Fall 2013 Meeting. Agenda  Introductory Icebreaker  Fall Meetings Info  Algorithmic Mechanism Design Talk  UAIG grad talk series (NEW!)
Introduction Many decision making problems in real life
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 3: TD( ) and eligibility traces.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks Chapter 7
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
1 Introduction to Reinforcement Learning Freek Stulp.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
University of Colorado Boulder ASEN 5070 Statistical Orbit determination I Fall 2012 Professor George H. Born Professor Jeffrey S. Parker Lecture 9: Least.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
5th summer meeting. On the agenda: 1. Icebreaker 2. Graphics and poll results 3. Coding challenges 4. Introduction to RLPy 5. Next meeting 6. Free discussion.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Reinforcement Learning
Stochastic tree search and stochastic games
A Crash Course in Reinforcement Learning
Reinforcement learning (Chapter 21)
AlphaGo with Deep RL Alpha GO.
Markov Decision Processes
Policy Gradient in Continuous Time
Orthogonality and Least Squares
Reinforcement Learning
Reinforcement Learning
Reinforcement Learning
Expectation-Maximization & Belief Propagation
CS 188: Artificial Intelligence Spring 2006
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Reinforcement Learning (2)
Orthogonality and Least Squares
Reinforcement Learning (2)
What is Artificial Intelligence?
Presentation transcript:

UAIG: Second Fall 2013 Meeting

Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion

Say your name and answer at least one of these questions: If you were to change your name, what would you change your name to? Why? Are you spring, summer, fall, or winter? Please share why. What's your favorite material object that you already own? What item, that you don't have already, would you most like to own? If you were to create a slogan for your life, what would it be?  Introductory Icebreaker

Come to our biweekly meetings Take charge of one of our meetings by presenting your own research, an interesting paper that you’ve read, or something else you might think is relevant (talk to us if you have ideas!). Organize an AI coding challenge or event If you do item 2 or 3, then we will appoint you as “Project Manager” and you will join the ranks of UAIG execs! ^_^  How to get Involved with

Reading for today’s meeting: “Reinforcement Learning: A Tutorial” by Harmon and Harmon. ltutorial.pdf. "The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in a wide range of disciplines." ltutorial.pdf  Discussion: RL

Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question. Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems.  Definitions in the reading

Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control. Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions.  Definitions in the reading

V*(x t ) is the optimal value function x t is the state vector V(x t ) is the approximation of the value function γ is a discount factor in the range [0,1] that causes immediate reinforcement to have more importance (weighted more heavily) than future reinforcement. e(x t ) is the error in the approximation of the value of the state occupied at time t.  Definitions in the reading

T is the terminal state. The true value of this state is known a priori. In other words, the error in the approximation of the state labeled T, e(T), is 0 by definition. u is the action performed in state x t and causes a transition to state x t +1, and r(x t, u) is the reinforcement received when performing action u in state x t. Δw t is the change in the weights at time t….?  Definitions in the reading

One might use a neural network for the approximation V(x t,w t ) of V*(x), where w t is the parameter vector A deterministic Markov decision process is one in which the state transitions are deterministic (an action performed in state xt always transitions to the same successor state x t +1 ). Alternatively, in a nondeterministic Markov decision process, a probability distribution function defines a set of potential successor states for a given action in a given state. α is the learning rate  Definitions in the reading

For the state/action pair (x,u ) an advantage, A(x t, u t ) is defined as the sum of the value of the state and the utility (advantage) of performing action u rather than the action currently considered best. For optimal actions this utility is zero, meaning the value of the action is also the value of the state; for sub-optimal actions the utility is negative, representing the degree of sub-optimality relative to the optimal action.  Definitions in the reading

K is a time unit scaling factor, and <> represents the expected value over all possible results of performing action u in state xt to receive immediate reinforcement r and to transition to a new state xt+1. g is the sum of past gradients in equation (20)  Definitions in the reading

Google these if you don’t understand them: Markov chain Markov decision process Mean squared error Monte Carlo rollout  More terms in the reading

^_^  Free Discussion