Example: Simplified PD World

Slides:

Advertisements

Similar presentations

Homework Answers.

Advertisements

Eick: Reinforcement Learning. Topic 18: Reinforcement Learning 1. Introduction 2. Bellman Update 3. Temporal Difference Learning 4. Discussion of Project1.

Reinforcement Learning

Eick: Q-Learning for the PD-World COSC 6342 Project 1 Spring 2014 Q-Learning for a Pickup Dropoff World P P PD D D.

Markov Decision Processes

Planning under Uncertainty

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 12, 2010.

Technical Question Technical Question

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Solving Systems of Linear Equations By Elimination.

Bell Ringer Solve c – 2 > 20

EXAMPLE 2 Write a rule for the nth term a. 4, 9, 14, 19,... b. 60, 52, 44, 36,... SOLUTION The sequence is arithmetic with first term a 1 = 4 and common.

February 8, 2012 At the end of the day, you will be able to solve quadratic equations with imaginary solutions. Warm-up: Correct HW x = -8, 326.

Standard 22 Identify arithmetic sequences Tell whether the sequence is arithmetic. a. –4, 1, 6, 11, 16,... b. 3, 5, 9, 15, 23,... SOLUTION Find the differences.

Read a problem and make a plan EXAMPLE 1 Running You run in a city where the short blocks on north-south streets are 0.1 mile long. The long blocks on.

Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.

MARKET ADJUSTMENTS Changes To Demand And Supply. MARKET ADJUSTMENTS l The price of a product remains at the equilibrium point until something changes.

Differential Equations and Slope Fields 6.1. Differential Equations  An equation involving a derivative is called a differential equation.  The order.

1 Introduction to Reinforcement Learning Freek Stulp.

Table of Contents Solving Quadratic Equations – Quadratic Formula The following shows how to solve quadratic equations using the Quadratic Formula. A quadratic.

Algebra II 4.6: Imaginary Numbers Quiz : Thursday, Oct. 16 th.

4.8 “The Quadratic Formula” Steps: 1.Get the equation in the correct form. 2.Identify a, b, & c. 3.Plug numbers into the formula. 4.Solve, then simplify.

1 Exercise 1. (a) Find all optimal sequences for the scheduling problem 1 ||  w j C j with the following jobs. (b) Determine the effect of a change in.

4.2 Critical Points Mon Oct 19 Do Now Find the derivative of each 1) 2)

: Solve using mental math. 1.3x = –18 ANSWER –6 ANSWER –10 2. b + 21 = Simplify the expression 3 ( x + 2 ) – 4x + 1. ANSWER –x + 7.

General Analysis Procedure and Calculator Policy Calculator Policy.

Algebra 2/TrigonometryName: __________________________ 13.5, 13.6 PracticeDate: ________________ Block: _____ 1) Method: _________________________2) Method:

Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,

Example 1 Read a problem and make a plan

Opener Simplify or Solve: 1. x + 3(x – 6) 2. 4x + 5 = (3 + x)y + x² + 2xy 4. 6(x + 2) + 4x x + x – 6 = 3 6. –(x + 7) = -3.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

What we will do What we will learn 11Sci, Rātū 23 Haratua 2016

Homework Assistance Guide

Verifying Trigonometric Identities

Warm-Up 5/14/08.

Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006.

Teaching Style COSC 6368 Teaching Style COSC 6368

Notes Over 4.2 Is a Solution Verifying Solutions of an Equation

PD-World Pickup: Cells: (1,1), (4,1),(3,3),(5,5)

Algebra 12.2/12.3 Review.

Skills Check ALL Factoring

Unit 2 Day 1 Factoring Review.

Sequence: A list of numbers in a particular order

Agenda Warm-up Lesson 2 hw corrections Lesson 3 hw questions? Lesson 4

CS 188: Artificial Intelligence Fall 2007

Section 2.3 – Systems of Inequalities Graphing Calculator Required

(Please have your homework out)

Skills Check Solve by Factoring and Square Roots

Algebra 2 Ch.3 Notes Page 15 P Solving Systems Algebraically.

Who Wants to be an Equationaire?. Who Wants to be an Equationaire?

Section 9.4 – Systems of Inequalities Graphing Calculator Required

Introduction to Reinforcement Learning and Q-Learning

COSC 4368 Group Project Spring 2019 Learning Paths from Feedback Using Reinforcement Learning for a Transportation World P D P D D P.

HW: Maintenance Sheet DUE

Questions over HW?. Skills Check Radical Operations and Solving by Square Roots after HW Check.

Licensed Electrical & Mechanical Engineer

How this works… Algebra 2 HSAPers

4.9 Notes – Graph and Solve Quadratic Inequalities

CS 416 Artificial Intelligence

Review: 8.5b Mini-Quiz A person plans to invest up to $10,000 in two different interest-bearing accounts, account X and account Y. Account Y is to contain.

DO NOW Copy down your homework: 2-2 Lesson check on page 91

Welcome 12/10/14 Find the equation in standard form given the following information: Zeros x = 5 ± 3i, f(2) = 27.

Warm UP Simplify .

Algebra 1 Warm Ups 1/15.

Section 4.2 Solving a System of Equations in Two Variables by the Substitution Method.

Comparing Expressions Review

Presentation transcript:

Example: Simplified PD World 1 2 P w s n w 4 3 D e Ungraded Homework: Op. Sequence corrected on Nov. 7 State Space: {1, 2, 3, 4, 1’, 2’, 3’, 4’}; “’” indicates carrying a block… Policy: agent starts is state 1 and applies operators: e-p-w-e-s-d-w-n-e-p-s-d HW1: Use Q-learning to construct the q-table assuming =0.5 and =0.5 and that pickup/dropoff reward is 12 and the move penalty is 1!

Q-Learning Solution Sketch for Simplified PD World 1 2 P w Nov. 9, 2017 s Q(a,s)  (1-)*Q(a,s) + *[ R(s) + γ*maxa’Q(a’,s’)] n w 4 3 D e Agent starts in state 1 and we obtain: Q(e,1) = -0.5 Q(p,2) = +6 Q(w,2’) = -0.5 Q(e,1’) = 0+0.5*(-1+0.5*max(-0.5,0))=-0.5 2’ has q-values for w and s; q(s,2’) is still 0 Q(s,2’) = -0.5 Q(d,3’) = 0+0.5*(12+0.5*0)=+6 3 has not been visited yet; its q-value for w is still 0 Q(w,3) = -0.5 Q(n,4) = 0+0.5*(-1+ 0.5*-0,5)=-0.625 state 1 has only one q-value q(e,1): e is the only applicable operator Q(e,1) = 0.5*-0.5+0.5*(-1+0.5*6)=+0.75 in state 2 p is the only applicable operator and q(p,2)=6 Q(p,2) = 6*0.5 + 0.5*(12+0.5*max(-0.5,-0.5))=8.875 state 2’ has 2 q-values both of which are -0.5 Q(s,2’) = 0.5*-0.5 * 0.5*(-1+0.5*6)=+0.75 in state 3’ d is the only applicable operator and q(d,3’)=6 Q(d,3’) = 6*0.5 + 0.5*(12+0.5*-0.5)=8.875 state 3 has only a q-value for w which is -0.5 Q-Learning Solution Sketch for Simplified PD World

Example: Simplified PD World 1 2 P w s n w 4 3 D e Ungraded Homework: Solve before Tu., Oct. 31 State Space: {1, 2, 3, 4, 1’, 2’, 3’, 4’}; “’” indicates carrying a block… Policy: agent starts is state 1 and applies operators: e-p-s-d-w-n-e-p-w-e-s-d HW2:Construct the q-table using SARSA assuming =0.5 and =0.5 and that pickup/dropoff reward is 12 and the move penalty is 1!

SARSA Solution Sketch for Simplified PD World Agent starts in state 1 and we obtain: Q(e,1) = -0.5 Q(p,2) = +6 Q(s,2’) = -0.5 Q(d,3’) = 0+0.5*(12+0.5*0)=+6 Q(w,3) = -0.5 Q(n,4) = 0+0.5*(-1+ 0.5*-0,5)=-0.625 Q(e,1) = 0.5*-0.5+0.5*(-1+0.5*6)=+0.75 Q(p,2) = 6*0.5 + 0.5*(12+0.5*0)=9 Q(w,2’) = -0.5 Q(e,1’) = 0+0.5*(-1+0.5*-0.5)=-0.625 Q(s,2’) = 0.5*-0.5 + 0.5*(-1+0.5*6)=+0.75 Q(d,3’) = 6*0.5 + 0.5*(12+0.5*0)=9 Q(a,s)  (1-)*Q(a,s) + *[ R(s) + γ*Q(a’,s’)] e 1 2 P w s n w 4 3 D e d last action; Action a’ not known; Q(a’,s’) is set to 0 by default in this case. Please verify!

Initial Q-Table Simplified PD World n s e w p d 1 - - 0 - - - 2 - 0 - 0 0 - 3 - - - 0 - - 4 0 - 0 - - - 1’ - - 0 - - - 2’ - 0 - 0 - - 3’ - - - 0 - 0 4’ 0 - 0 - - - ‘-’:= means operator not applicable; moreover, p and d are not always applicable in states 2 and 3’; moreover, s and w in state 2 and w in state 3’ is only applicable, if the pickup location is empty/the dropoff location is full!