UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

UAIG: Second Fall 2013 Meeting

Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion

Say your name and answer at least one of these questions: If you were to change your name, what would you change your name to? Why? Are you spring, summer, fall, or winter? Please share why. What's your favorite material object that you already own? What item, that you don't have already, would you most like to own? If you were to create a slogan for your life, what would it be?  Introductory Icebreaker

Come to our biweekly meetings Take charge of one of our meetings by presenting your own research, an interesting paper that you’ve read, or something else you might think is relevant (talk to us if you have ideas!). Organize an AI coding challenge or event If you do item 2 or 3, then we will appoint you as “Project Manager” and you will join the ranks of UAIG execs! ^_^  How to get Involved with

Reading for today’s meeting: “Reinforcement Learning: A Tutorial” by Harmon and Harmon. http://www.cs.toronto.edu/~zemel/documents/411/r ltutorial.pdf. "The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in a wide range of disciplines." http://www.cs.toronto.edu/~zemel/documents/411/r ltutorial.pdf  Discussion: RL

Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question. Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems.  Definitions in the reading

Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control. Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions.  Definitions in the reading

V*(x t ) is the optimal value function x t is the state vector V(x t ) is the approximation of the value function γ is a discount factor in the range [0,1] that causes immediate reinforcement to have more importance (weighted more heavily) than future reinforcement. e(x t ) is the error in the approximation of the value of the state occupied at time t.  Definitions in the reading

T is the terminal state. The true value of this state is known a priori. In other words, the error in the approximation of the state labeled T, e(T), is 0 by definition. u is the action performed in state x t and causes a transition to state x t +1, and r(x t, u) is the reinforcement received when performing action u in state x t. Δw t is the change in the weights at time t….?  Definitions in the reading

One might use a neural network for the approximation V(x t,w t ) of V*(x), where w t is the parameter vector A deterministic Markov decision process is one in which the state transitions are deterministic (an action performed in state xt always transitions to the same successor state x t +1 ). Alternatively, in a nondeterministic Markov decision process, a probability distribution function defines a set of potential successor states for a given action in a given state. α is the learning rate  Definitions in the reading

For the state/action pair (x,u ) an advantage, A(x t, u t ) is defined as the sum of the value of the state and the utility (advantage) of performing action u rather than the action currently considered best. For optimal actions this utility is zero, meaning the value of the action is also the value of the state; for sub-optimal actions the utility is negative, representing the degree of sub-optimality relative to the optimal action.  Definitions in the reading

K is a time unit scaling factor, and <> represents the expected value over all possible results of performing action u in state xt to receive immediate reinforcement r and to transition to a new state xt+1. g is the sum of past gradients in equation (20)  Definitions in the reading

Google these if you don’t understand them: Markov chain Markov decision process Mean squared error Monte Carlo rollout  More terms in the reading

^_^  Free Discussion

UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Similar presentations

Presentation on theme: "UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Similar presentations

Presentation on theme: "UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion."— Presentation transcript:

Similar presentations

About project

Feedback