Example: Simplified PD World

Slides:



Advertisements
Similar presentations
Homework Answers.
Advertisements

Eick: Reinforcement Learning. Topic 18: Reinforcement Learning 1. Introduction 2. Bellman Update 3. Temporal Difference Learning 4. Discussion of Project1.
Reinforcement Learning
Eick: Q-Learning for the PD-World COSC 6342 Project 1 Spring 2014 Q-Learning for a Pickup Dropoff World P P PD D D.
Markov Decision Processes
Planning under Uncertainty
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Decision Theory: Sequential Decisions Computer Science cpsc322, Lecture 34 (Textbook Chpt 9.3) April, 12, 2010.
Technical Question Technical Question
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Solving Systems of Linear Equations By Elimination.
Bell Ringer Solve c – 2 > 20
EXAMPLE 2 Write a rule for the nth term a. 4, 9, 14, 19,... b. 60, 52, 44, 36,... SOLUTION The sequence is arithmetic with first term a 1 = 4 and common.
February 8, 2012 At the end of the day, you will be able to solve quadratic equations with imaginary solutions. Warm-up: Correct HW x = -8, 326.
Standard 22 Identify arithmetic sequences Tell whether the sequence is arithmetic. a. –4, 1, 6, 11, 16,... b. 3, 5, 9, 15, 23,... SOLUTION Find the differences.
Read a problem and make a plan EXAMPLE 1 Running You run in a city where the short blocks on north-south streets are 0.1 mile long. The long blocks on.
Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.
MARKET ADJUSTMENTS Changes To Demand And Supply. MARKET ADJUSTMENTS l The price of a product remains at the equilibrium point until something changes.
Differential Equations and Slope Fields 6.1. Differential Equations  An equation involving a derivative is called a differential equation.  The order.
1 Introduction to Reinforcement Learning Freek Stulp.
Table of Contents Solving Quadratic Equations – Quadratic Formula The following shows how to solve quadratic equations using the Quadratic Formula. A quadratic.
Algebra II 4.6: Imaginary Numbers Quiz : Thursday, Oct. 16 th.
4.8 “The Quadratic Formula” Steps: 1.Get the equation in the correct form. 2.Identify a, b, & c. 3.Plug numbers into the formula. 4.Solve, then simplify.
1 Exercise 1. (a) Find all optimal sequences for the scheduling problem 1 ||  w j C j with the following jobs. (b) Determine the effect of a change in.
4.2 Critical Points Mon Oct 19 Do Now Find the derivative of each 1) 2)
: Solve using mental math. 1.3x = –18 ANSWER –6 ANSWER –10 2. b + 21 = Simplify the expression 3 ( x + 2 ) – 4x + 1. ANSWER –x + 7.
General Analysis Procedure and Calculator Policy Calculator Policy.
Algebra 2/TrigonometryName: __________________________ 13.5, 13.6 PracticeDate: ________________ Block: _____ 1) Method: _________________________2) Method:
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Example 1 Read a problem and make a plan
Opener Simplify or Solve: 1. x + 3(x – 6) 2. 4x + 5 = (3 + x)y + x² + 2xy 4. 6(x + 2) + 4x x + x – 6 = 3 6. –(x + 7) = -3.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
What we will do What we will learn 11Sci, Rātū 23 Haratua 2016
Homework Assistance Guide
Verifying Trigonometric Identities
Warm-Up 5/14/08.
Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006.
Teaching Style COSC 6368 Teaching Style COSC 6368
Notes Over 4.2 Is a Solution Verifying Solutions of an Equation
PD-World Pickup: Cells: (1,1), (4,1),(3,3),(5,5)
Algebra 12.2/12.3 Review.
Skills Check ALL Factoring
Unit 2 Day 1 Factoring Review.
Sequence: A list of numbers in a particular order
Agenda Warm-up Lesson 2 hw corrections Lesson 3 hw questions? Lesson 4
CS 188: Artificial Intelligence Fall 2007
Section 2.3 – Systems of Inequalities Graphing Calculator Required
(Please have your homework out)
Skills Check Solve by Factoring and Square Roots
Algebra 2 Ch.3 Notes Page 15 P Solving Systems Algebraically.
Who Wants to be an Equationaire?. Who Wants to be an Equationaire?
Section 9.4 – Systems of Inequalities Graphing Calculator Required
Introduction to Reinforcement Learning and Q-Learning
COSC 4368 Group Project Spring 2019 Learning Paths from Feedback Using Reinforcement Learning for a Transportation World P D P D D P.
HW: Maintenance Sheet DUE
Questions over HW?. Skills Check Radical Operations and Solving by Square Roots after HW Check.
Licensed Electrical & Mechanical Engineer
How this works… Algebra 2 HSAPers
4.9 Notes – Graph and Solve Quadratic Inequalities
CS 416 Artificial Intelligence
Review: 8.5b Mini-Quiz A person plans to invest up to $10,000 in two different interest-bearing accounts, account X and account Y. Account Y is to contain.
DO NOW Copy down your homework: 2-2 Lesson check on page 91
Welcome 12/10/14   Find the equation in standard form given the following information: Zeros x = 5 ± 3i, f(2) = 27.
Warm UP Simplify      .
Algebra 1 Warm Ups 1/15.
Section 4.2 Solving a System of Equations in Two Variables by the Substitution Method.
Comparing Expressions Review
Presentation transcript:

Example: Simplified PD World 1 2 P w s n w 4 3 D e Ungraded Homework: Op. Sequence corrected on Nov. 7 State Space: {1, 2, 3, 4, 1’, 2’, 3’, 4’}; “’” indicates carrying a block… Policy: agent starts is state 1 and applies operators: e-p-w-e-s-d-w-n-e-p-s-d HW1: Use Q-learning to construct the q-table assuming =0.5 and =0.5 and that pickup/dropoff reward is 12 and the move penalty is 1!

Q-Learning Solution Sketch for Simplified PD World 1 2 P w Nov. 9, 2017 s Q(a,s)  (1-)*Q(a,s) + *[ R(s) + γ*maxa’Q(a’,s’)] n w 4 3 D e Agent starts in state 1 and we obtain: Q(e,1) = -0.5 Q(p,2) = +6 Q(w,2’) = -0.5 Q(e,1’) = 0+0.5*(-1+0.5*max(-0.5,0))=-0.5 2’ has q-values for w and s; q(s,2’) is still 0 Q(s,2’) = -0.5 Q(d,3’) = 0+0.5*(12+0.5*0)=+6 3 has not been visited yet; its q-value for w is still 0 Q(w,3) = -0.5 Q(n,4) = 0+0.5*(-1+ 0.5*-0,5)=-0.625 state 1 has only one q-value q(e,1): e is the only applicable operator Q(e,1) = 0.5*-0.5+0.5*(-1+0.5*6)=+0.75 in state 2 p is the only applicable operator and q(p,2)=6 Q(p,2) = 6*0.5 + 0.5*(12+0.5*max(-0.5,-0.5))=8.875 state 2’ has 2 q-values both of which are -0.5 Q(s,2’) = 0.5*-0.5 * 0.5*(-1+0.5*6)=+0.75 in state 3’ d is the only applicable operator and q(d,3’)=6 Q(d,3’) = 6*0.5 + 0.5*(12+0.5*-0.5)=8.875 state 3 has only a q-value for w which is -0.5 Q-Learning Solution Sketch for Simplified PD World

Example: Simplified PD World 1 2 P w s n w 4 3 D e Ungraded Homework: Solve before Tu., Oct. 31 State Space: {1, 2, 3, 4, 1’, 2’, 3’, 4’}; “’” indicates carrying a block… Policy: agent starts is state 1 and applies operators: e-p-s-d-w-n-e-p-w-e-s-d HW2:Construct the q-table using SARSA assuming =0.5 and =0.5 and that pickup/dropoff reward is 12 and the move penalty is 1!

SARSA Solution Sketch for Simplified PD World Agent starts in state 1 and we obtain: Q(e,1) = -0.5 Q(p,2) = +6 Q(s,2’) = -0.5 Q(d,3’) = 0+0.5*(12+0.5*0)=+6 Q(w,3) = -0.5 Q(n,4) = 0+0.5*(-1+ 0.5*-0,5)=-0.625 Q(e,1) = 0.5*-0.5+0.5*(-1+0.5*6)=+0.75 Q(p,2) = 6*0.5 + 0.5*(12+0.5*0)=9 Q(w,2’) = -0.5 Q(e,1’) = 0+0.5*(-1+0.5*-0.5)=-0.625 Q(s,2’) = 0.5*-0.5 + 0.5*(-1+0.5*6)=+0.75 Q(d,3’) = 6*0.5 + 0.5*(12+0.5*0)=9 Q(a,s)  (1-)*Q(a,s) + *[ R(s) + γ*Q(a’,s’)] e 1 2 P w s n w 4 3 D e d last action; Action a’ not known; Q(a’,s’) is set to 0 by default in this case. Please verify!

Initial Q-Table Simplified PD World n s e w p d 1 - - 0 - - - 2 - 0 - 0 0 - 3 - - - 0 - - 4 0 - 0 - - - 1’ - - 0 - - - 2’ - 0 - 0 - - 3’ - - - 0 - 0 4’ 0 - 0 - - - ‘-’:= means operator not applicable; moreover, p and d are not always applicable in states 2 and 3’; moreover, s and w in state 2 and w in state 3’ is only applicable, if the pickup location is empty/the dropoff location is full!