CS294: Solving system problems with RL

Slides:



Advertisements
Similar presentations
Is the shape below a function? Explain. Find the domain and range.
Advertisements

SCIP Optimization Suite
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Restless bandits and congestion control Mark Handley, Costin Raiciu, Damon Wischik UCL.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Reinforcement learning (Chapter 21)
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
April 2004MQP Topic PresentationsSlide 1 Mark Claypool’s Projects Network Games Streaming Video.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Math443/543 Mathematical Modeling and Optimization
1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.
Layered Video over TCPW David Chanady, Nadeem Aboobaker, Jennifer Wong CS 215 Networking Fundementals Winter 2001 March 20, 2001.
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Economics 214 Lecture 37 Constrained Optimization.
TCP on High-Speed Networks Sangtae Ha and Injong Rhee North Carolina State University.
Rutgers CS440, Fall 2003 Reinforcement Learning Reading: Ch. 21, AIMA 2 nd Ed.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Lecture 35 Constrained Optimization
Properties of the System Elements RELIABILITY (AVAILABILITY) PERFORMANCE COST R R R.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Tiziana FerrariNetwork metrics usage for optimization of the Grid1 DataGrid Project Work Package 7 Written by Tiziana Ferrari Presented by Richard Hughes-Jones.
Linear Programming: Data Fitting Steve Gu Mar 21, 2008.
Line Balancing Problem
CPU Scheduling Gursharan Singh Tatla 1-Feb-20111www.eazynotes.com.
Utility Maximization for Delay Constrained QoS in Wireless I-Hong Hou P.R. Kumar University of Illinois, Urbana-Champaign 1 /23.
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
MDPs (cont) & Reinforcement Learning
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Stacking chart example -Show a chart like the one in detail information file - Stacking charts Horizontal -Limit the Height to the min and max for that.
Reinforcement learning (Chapter 21)
TUTORIAL 10 VISUAL AESTHETICS Is it only screen deep?
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Economics 2301 Lecture 37 Constrained Optimization.
1 Exercise 1. (a) Find all optimal sequences for the scheduling problem 1 ||  w j C j with the following jobs. (b) Determine the effect of a change in.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Schedule Do Now Homework
Introduction to Linear Programs
Reinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21)
Markov Decision Processes
Instrumental or Operant Conditioning
Video through a Crystal Ball:
نتعارف لنتألف في التعارف تألف (( الأرواح جنود مجندة , ماتعارف منها أئتلف , وماتنافر منها اختلف )) نماذج من العبارات الايجابية.
Policy Gradient in Continuous Time
Lecture 35 CSE 331 Nov 27, 2017.
Lecture 34 CSE 331 Nov 20, 2017.
Foundations for Algebra
Lecture 35 CSE 331 Nov 28, 2016.
Introduction to Scheduling Chapter 1
7.5 – Constrained Optimization: The Method of Lagrange Multipliers
SESO 2018 International Thematic Week
Lecture 34 CSE 331 Nov 21, 2016.
CS 188: Artificial Intelligence Spring 2006
Understanding Congestion Control Mohammad Alizadeh Fall 2018
AI Applications in Network Congestion Control
Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.
Real-world Video Adaptation with Reinforcement Learning
Optimization & Quadratics
CS 440/ECE448 Lecture 22: Reinforcement Learning
Presentation transcript:

CS294: Solving system problems with RL Joey Gonzalez and Ion Stoica April 1, 2019

Reinforcement Learning *https://sergioskar.github.io/Reinforcement_learning/

Many systems problems fit this pattern TCP Video bitrate adaptation Job scheduling Objective (reward) Max throughput Quality of Experience (QoE) Max throughput / Min response time Control Window size Bitrate Next job to schedule Environment Losses, etc throughput, buf size, file size System utilization, job characteristics, etc

Classic control systems

Optimal control vs Deep RL Objective Cost function Reward Control Set of differential eqs. Neural Network Environment Known model (often expressed as constraints) Possible unknown model RL is a form of stochastic optimal controls *Largely founded by  Lev Pontryagin and Richard Bellman 

1950s-1960s Application (Assembly Language)