David Wingate Reinforcement Learning for Complex System Management.

Slides:



Advertisements
Similar presentations
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Advertisements

Partially Observable Markov Decision Process (POMDP)
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement learning (Chapter 21)
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
A: A Unified Brand-name-Free Introduction to Planning Subbarao Kambhampati Environment What action next? The $$$$$$ Question.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Reinforcement learning
Reinforcement Learning
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Machine Learning Lecture 11: Reinforcement Learning
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Department of Computer Science Undergraduate Events More
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Reinforcement Learning (1)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Artificial Intelligence
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Reinforcement Learning
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
MDPs (cont) & Reinforcement Learning
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Department of Computer Science Undergraduate Events More
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement learning (Chapter 21)
Analytics and OR DP- summary.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement learning (Chapter 21)
Reinforcement Learning
Markov Decision Processes
Markov Decision Processes
Hierarchical POMDP Solutions
Chapter 2: Evaluative Feedback
Reinforcement Learning
October 6, 2011 Dr. Itamar Arel College of Engineering
CS 188: Artificial Intelligence Fall 2008
Chapter 2: Evaluative Feedback
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

David Wingate Reinforcement Learning for Complex System Management

Complex Systems Science and engineering will increasingly turn to machine learning to cope with increasingly complex data and systems. Can we design new systems that are so complex they are beyond our native abilities to control? A new class of systems that are intended to be controlled by machine learning?

Outline Intro to Reinforcement Learning RL for Complex Systems

RL: Optimizing Sequential Decisions Under Uncertainty observations actions

Classic Formalism Given: –A state space –An action space –A reward function –Model information (ranges from full to nothing) Find: –A policy (a mapping from states to actions) Such that: –A reward-based metric is maximized

Reinforcement Learning RL = learning meets planning

Reinforcement Learning Logistics and scheduling Acrobatic helicopters Load balancing Robot soccer Bipedal locomotion Dialogue systems Game playing Power grid control … RL = learning meets planning

Reinforcement Learning Logistics and scheduling Acrobatic helicopters Load balancing Robot soccer Bipedal locomotion Dialogue systems Game playing Power grid control … Model: Pieter Abbeel. Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control. PhD Thesis, RL = learning meets planning

Reinforcement Learning Logistics and scheduling Acrobatic helicopters Load balancing Robot soccer Bipedal locomotion Dialogue systems Game playing Power grid control … Model: Peter Stone, Richard Sutton, Gregory Kuhlmann. Reinforcement Learning for RoboCup Soccer Keepaway. Adaptive Behavior, Vol. 13, No. 3, 2005 RL = learning meets planning

Reinforcement Learning Logistics and scheduling Acrobatic helicopters Load balancing Robot soccer Bipedal locomotion Dialogue systems Game playing Power grid control … Model: David Silver, Richard Sutton and Martin Muller. Sample-based learning and search with permanent and transient memories. ICML 2008 RL = learning meets planning

Types of RL By problem setting –Fully vs. partially observed –Continuous or discrete –Deterministic vs. stochastic –Episodic vs. sequential –Stationary vs. non-stationary –Flat vs. factored By optimization objective –Average reward –Infinite horizon (expected discounted reward) By solution approach –Model-free vs. Model-based (Q-learning, Bayesian RL, …) –Online vs. batch –Value function-based vs. policy search –Dynamic programming, Monte-Carlo, TD You can slice and dice RL many ways:

Fundamental Questions Exploration vs. exploitation On-policy vs. off-policy learning Generalization –Selecting the right representations –Features for function approximators Sample and computational complexity

RL vs. Optimal Control vs. Classical Planning You probably want to use RL if –You need to learn something on-line about your system. You don’t have a model of the system There are things you simply cannot predict –Classic planning is too complex / expensive You have a model, but it’s intractable to plan You probably want to use optimal control if –Things are mathematically tidy You have a well-defined model and objective Your model is analytically tractable Ex.: holonomic PID; linear-quadratic regulator You probably want to use classical planning if –You have a model (probably deterministic) –You’re dealing with a highly structured environment Symbolic; STRIPS, etc.

RL for Complex Systems

Smartlocks A future multicore scenario –It’s the year 2018 –Intel is running a 15nm process –CPUs have hundreds of cores There are many sources of asymmetry –Cores regularly overheat –Manufacturing defects result in different frequencies –Nonuniform access to memory controllers How can a programmer take full advantage of this hardware? One answer: let machine learning help manage complexity

Smartlocks A mutex combined with a reinforcement learning agent Learns to resolve contention by adaptively prioritizing lock acquisition

Smartlocks A mutex combined with a reinforcement learning agent Learns to resolve contention by adaptively prioritizing lock acquisition

Smartlocks A mutex combined with a reinforcement learning agent Learns to resolve contention by adaptively prioritizing lock acquisition

Smartlocks A mutex combined with a reinforcement learning agent Learns to resolve contention by adaptively prioritizing lock acquisition

Details Model-free Policy search via policy gradients Objective function: heartbeats / second ML engine runs in an additional thread Typical operations: simple linear algebra –Compute bound, not memory bound

Smart Data Structures

Results

Extensions? Combine with model-building? –Bayesian RL? Could replace mutexes in different places to derive smart versions of –Scheduler –Disk controller –DRAM controller –Network controller More abstract, too –Data structures –Code sequences?

More General ML/RL? General ML for optimization of tunable knobs in any algorithm –Preliminary experiments with smart data structures –Passcount tuning for flat-combining – a big win! What might hardware support look like? –ML coprocessor? Tuned for policy gradients? Model building? Probabilistic modeling? Expose accelerated ML/RL API as a low-level system service?

Thank you!

Bayesian RL Use Hierarchical Bayesian methods to learn a rich model of the world while using planning to figure out what to do with it

Bayesian Modeling

What is Bayesian Modeling? Find structure in data while dealing explicitly with uncertainty The goal of a Bayesian is to reason about the distribution of structure in data

Example What line generated this data? This one? What about this one? Probably not this one That one?

What About the “Bayes” Part? Prior Likelihood Bayes Law is a mathematical fact that helps us

Distributions Over Structure Visual perception Natural language Speech recognition Topic understanding Word learning Causal relationships Modeling relationships Intuitive theories …

Distributions Over Structure Visual perception Natural language Speech recognition Topic understanding Word learning Causal relationships Modeling relationships Intuitive theories …

Distributions Over Structure Visual perception Natural language Speech recognition Topic understanding Word learning Causal relationships Modeling relationships Intuitive theories …

Distributions Over Structure Visual perception Natural language Speech recognition Topic understanding Word learning Causal relationships Modeling relationships Intuitive theories …

Inference Some questions we can ask: –Compute an expected value –Find the MAP value –Compute the marginal likelihood –Draw a sample from the distribution All of these are computationally hard So, we’ve defined these distributions mathematically. What can we do with them?