Bayesian Optimization with Experimental Constraints Javad Azimi Advisor: Dr. Xiaoli Fern PhD Proposal Exam April 2012 1.

Slides:



Advertisements
Similar presentations
Order Statistics Sorted
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Experimental Design, Response Surface Analysis, and Optimization
Lecture 3 Probability and Measurement Error, Part 2.
Engineering Economic Analysis Canadian Edition
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Planning under Uncertainty
Visual Recognition Tutorial
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Radial Basis Function Networks
Introduction to Monte Carlo Methods D.J.C. Mackay.
Gaussian process modelling
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Estimation Basic Concepts & Estimation of Proportions
 1  Outline  stages and topics in simulation  generation of random variates.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Monte Carlo Simulation and Personal Finance Jacob Foley.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
CS433 Modeling and Simulation Lecture 16 Output Analysis Large-Sample Estimation Theory Dr. Anis Koubâa 30 May 2009 Al-Imam Mohammad Ibn Saud University.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Engineering Economic Analysis Canadian Edition
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Statistics Presentation Ch En 475 Unit Operations.
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
OPERATING SYSTEMS CS 3502 Fall 2017
Budgeted Optimization with Concurrent Stochastic-Duration Experiments
A Few Projects To Share Javad Azimi May 2015.
Ch3: Model Building through Regression
Statistical Learning Dong Liu Dept. EEIS, USTC.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Chapter 8 Estimation.
Presentation transcript:

Bayesian Optimization with Experimental Constraints Javad Azimi Advisor: Dr. Xiaoli Fern PhD Proposal Exam April

Outline Introduction to Bayesian Optimization Completed Works – Constrained Bayesian Optimization – Batch Bayesian Optimization – Scheduling Methods for Bayesian Optimization Future Works – Hybrid Bayesian optimization Timeline 2

Bayesian Optimization We have a black box function and we don’t know anything about its distribution We are able to sample the function but it is very expensive We are interested to find the maximizer (minimizer) of the function Assumption: – lipschitz continuity 3 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Big Picture Introduction to BOConstrained BOBatch BOSchedulingFuture Work Current Experiments Posterior Model Select Experiment(s) Run Experiment(s) 4

Posterior Model (1): Regression approaches Simulates the unknown function distribution based on the prior – Deterministic (Classical Linear Regression,…) There is a deterministic prediction for each point x in the input space – Stochastic (Bayesian regression, Gaussian Process,…) There is a distribution over the prediction for each point x in the input space. (i.e. Normal distribution) – Example Deterministic : f(x 1 )=y 1, f(x 2 )=y 2 Stochastic: f(x 1 )=N(y 1,0.2) f(x 2 )=N(y 2,5) Introduction to BOConstrained BOBatch BOSchedulingFuture Work 5

Posterior Model (2): Gaussian Process Gaussian Process is used to build the posterior model – The prediction output at any point is a normal random variable – Variance is independent from observation y – The mean is a linear combination of observation y Points with high output expectation Points with high output variance 6

Selection Criterion Goal: Which point should be selected next to get to the maximizer of the function faster. Maximum Mean (MM) – Selects the points which has the highest output mean – Purely exploitative Maximum Upper bound Interval (MUI) – Select point with highest 95% upper confidence bound – Purely explorative approach Maximum Probability of Improvement (MPI) – It computes the probability that the output is more than (1+m) times of the best current observation, m>0. – Explorative and Exploitative Maximum Expected of Improvement (MEI) – Similar to MPI but parameter free – It simply computes the expected amount of improvement after sampling at any point Introduction to BOConstrained BOBatch BOSchedulingFuture Work MMMUIMPIMEI 7

8 Introduction to BOConstrained BOBatch BOSchedulingFuture Work Motivating Application: Fuel Cell Anode Cathode bacteria Oxidation products (CO 2 ) Fuel (organic matter) e-e- e-e- O2O2 H2OH2O H+H+ This is how an MFC works SEM image of bacteria sp. on Ni nanoparticle enhanced carbon fibers. Nano-structure of anode significantly impact the electricity production. We should optimize anode nano-structure to maximize power by selecting a set of experiment.

Other Applications Financial Investment Reinforcement Learning Drug test Destructive tests And … Introduction to BOConstrained BOBatch BOSchedulingFuture Work 9

Constrained Bayesian optimization ( AAAI 2010, to be submitted Journal ) Introduction to BOConstrained BOBatch BOSchedulingFuture Work 10

Problem Definition(1) Introduction to BOConstrained BOBatch BOSchedulingFuture Work BO assumes that we can ask for specific experiment This is unreasonable assumption in many applications – In Fuel Cell it takes many trials to create a nano- structure with specific requested properties. – Costly to fulfill 11

Problem Definition(2) It is less costly to fulfill a request that specifies ranges for the nanostructure properties E.g. run an experiment with Averaged Area in range r1 and Average Circularity in range r2 We will call such requests “constrained experiments” Space of Experiments Average Circularity Averaged Area Constrained Experiment 1 large ranges low cost high uncertainty about which experiment will be run Constrained Experiment 2 small ranges high cost low uncertainty about which experiment will be run Introduction to BOConstrained BOBatch BOSchedulingFuture Work 12

Proposed Approach We introduced two different formulation Non Sequential – Select all experiments at the same time Sequential – Only one constraint experiment is selected at each iteration Two challenges: – How to compute heuristics for constrained experiment? – How to take experimental cost into account?(which has been ignored by most of the approaches in BO) Introduction to BOConstrained BOBatch BOSchedulingFuture Work 13

Non-Sequential All experiments must be chosen at the same time Objective function: – A sub set of experiments (with cost B) which jointly have the highest expected maximum is selected, i.e. E[Max(.)] Introduction to BOConstrained BOBatch BOSchedulingFuture Work 14

Submodularity It simply means adding an element to the smaller set provides us with more improvement than adding an element to the larger set Example: We show that max (.) is submodular – S 1 ={1, 2, 4}, S 2 ={1, 2, 4, 8}, (S 1 is a subset of S 2 ), g=max(.) and x=6 – g(S 1, x) - g(S 1 )=2, g(S 2,x)-g(S 2 )=0 E[max(.)] over a set of jointly normal random variable is a submodular function Greedy algorithm provides us with a “constant” approximation bound Introduction to BOConstrained BOBatch BOSchedulingFuture Work 15

Greedy Algorithm Introduction to BOConstrained BOBatch BOSchedulingFuture Work 16

Sequential Policies Having the posterior distribution of p(y|x,D) and p x (.|D) we can calculate the posterior of the output of each constrained experiment which has a closed form solution Therefore we can compute standard BO heuristics for constrained experiments – There are closed form solution for these heuristics 17 Input space Discretization Level Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Budgeted Constrained We are limited with Budget B. Unfortunately heuristics will typically select the smallest and most costly constrained experiments which is not a good use of budget How can we consider the cost of each constrained experiment in making the decision? – Cost Normalized Policy (CN) – Constraint Minimum Cost Policy(CMC) 18 -Low uncertainty -High uncertainty -Better heuristic value -Lower heuristic value -Expensive-Cheap Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Cost Normalized Policy It selects the constrained experiment achieving the highest expected improvement per unit cost We report this approach for MEI policy only 19 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Constraint Minimum Cost Policy (CMC) Motivation: 1.Approximately maximizes the heuristic value 2.Has expected improvement at least as great as spending the same amount of budget on random experiments Example: Very expensive: 10 random experiments likely to be better Selected Constrained experiment Poor heuristic value: not select due to 1 st condition 20 Cost=4 random Cost=10 randomCost=5 random Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Results (1) 21 CMC-MEI Cosines Fuel Cell Real Rosenbrock Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Results (2) 22 NS Cosines Fuel Cell Real Rosenbrock Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Batch Bayesian Optimization (NIPS 2010) Sometimes it is better to select batch. (Javad Azimi) 23

Motivation Traditional BO approach request a single experiment at each iteration This is not time efficient when running an experiment is very time consuming and there is enough facilities to run up to k experiments concurrently We would like to improve performance per unit time by selecting/running k experiments in parallel A good batch approach can speedup the experimental procedure without degrading the performance Introduction to BOConstrained BOBatch BOSchedulingFuture Work 24

Main Idea We Use Monte Carlo simulation to select a batch of k experiments that closely match what a good sequential policy selection in k steps Introduction to BOConstrained BOBatch BOSchedulingFuture Work 25 Given a sequential Policy and batch size k x 11 x 12 x 13 x 1k x 21 x 22 x 23 x 2k x 31 x 32 x 33 x 3k x n1 x n2 x n3 x nk Return B*={x 1,x 2,…,x k }

Objective Function(1) Simulated Matching: – Having n different trajectories with length k from a given sequential policy – We want to select a batch of k experiments that best matches the behavior of the sequential policy This objective can be viewed as minimizing an upper bound on the expected performance difference between the sequential policy and the selected batch. This objective is similar to weighted k-medoid Introduction to BOConstrained BOBatch BOSchedulingFuture Work 26

Supermodularity Example: Min(.) is a supermodual function – B 1 ={1, 2, 4}, B 2 ={1, 2, 4, -2}, f=min(.) and x = 0 – f(B 1 ) -f(B 1, x)=1, f(B 2 )-f(B 2, x)=0 Quiz: What is the difference between submodular and supermodular function? – If the inequality is changed then we have submodular function The proposed objective function is a supermodular function The greedy algorithm provides us with an approximation bound Introduction to BOConstrained BOBatch BOSchedulingFuture Work 27

Algorithm Introduction to BOConstrained BOBatch BOSchedulingFuture Work 28

Results (5) Introduction to BOConstrained BOBatch BOSchedulingFuture Work Greedy 29

Scheduling Methods for Bayesian Optimization (NIPS 2011 (spotlight) ) 30

Extended BO Model Introduction to BOConstrained BOBatch BOSchedulingFuture Work 31 Problem: Schedule when to start new experiments and which ones to start Stochastic Experiment Durations Lab 1 Lab 2 Lab 3 Lab l x1x1 x2x2 x3x3 x4x4 x n-1 x5x5 x8x8 xnxn x7x7 x6x6 Time Horizon h We consider the following: Concurrent experiments (up to l exp. at any time) Stochastic exp. durations (known distribution p) Experiment budget (total of n experiments) Experimental time horizon h

Challenges Introduction to BOConstrained BOBatch BOSchedulingFuture Work 32 Objective 2 Objective 2: maximize info. used in selecting each experiments (favors minimizing concurrency) x1x1 x2x2 xnxn We present online and offline approaches that effectively trade off these two conflicting objectives Lab 4 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x4x4 Lab 1 Lab 2 Lab 3 x7x7

Objective Function Cumulative prior experiments (CPE) of E is measured as follows: Example: Suppose n 1 =1, n 2 =5, n 3 =5, n 4 =2, Then CPE=(1*0)+(5*1)+(5*6)+(2*11)=57 We found a non trivial correlation between CPE and regret Introduction to BOConstrained BOBatch BOSchedulingFuture Work 33

Offline Scheduling Assign start times to all n experiments before the experimental process begins The experiment selection is done online Two class of schedules are presented – Staged Schedules – Independent Labs Introduction to BOConstrained BOBatch BOSchedulingFuture Work 34

Staged Schedules There are N stage and each stage is represent as such that – CPE is calculated as: – We call an schedule uniform if | n i -n j |<2 Introduction to BOConstrained BOBatch BOSchedulingFuture Work x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 x 10 x 11 x 12 x 13 x 14 d1d1 d2d2 d3d3 d4d4 n 1 =4n 2 =3n 4 =3n 3 =4 h Goal: finding a p-safe uniform schedule with maximum number of stages. 35

Staged Schedules: Schedule Introduction to BOConstrained BOBatch BOSchedulingFuture Work 36

Independent Lab (IL) Assigns m i experiment to each lab i such that Experiments are distributed uniformly within the labs Start times of different labs are decoupled The experiments in each lab have equal duration to maximize the finishing probability within horizon h Mainly designed for policy switching schedule h x 11 Lab1 Lab2 Lab3 Lab4 x 12 x 13 x 14 x 21 x 22 x 23 x 24 x 31 x 32 x 33 x 41 x 42 x 43 Introduction to BOConstrained BOBatch BOSchedulingFuture Work 37

Online Schedules p -safe guarantee is fairly pessimistic and we can decrease the parallelization degree in practice Selects the start time of experiments online rather than offline More flexible than offline schedule Introduction to BOConstrained BOBatch BOSchedulingFuture Work 38

Baseline online Algorithms Online Fastest Completion policy (OnFCP) – Finish all of the n experiments as quickly as possible – Keeps all l labs busy as long as there are experiments left to run – Achieves the lowest possible CPE Online Minimum Eager Lab Policy (OnMEL) – OnFCP does not attempt to use the full time horizon – use only k labs, where k is the minimum number of labs required to finish n experiments with probability p Introduction to BOConstrained BOBatch BOSchedulingFuture Work 39

Policy Switching (PS) PS decides about the number of new experiments at each decision step Assume a set of policies or a policy generator is given The goal is defining a new policy which performs as well as or better than the best given policy at any state s The i-th policy waits to finish i experiments and then call offIL algorithm to reschedule The policy which achieves the maximum CPE is returned The CPE of the switching policy will not be much worse than the best of the policies produced by our generator Introduction to BOConstrained BOBatch BOSchedulingFuture Work 40

Experimental Results Introduction to BOConstrained BOBatch BOSchedulingFuture Work Setting: h=4,5,6; p d =Truncated normal distribution, n=20 and L=10 Best CPE in each setting Best Performance 41

Future Work 42 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Traditional Approaches Sequential: – Only one experiment is selected at each iteration – Pros: Performance is optimized – Cons: Can be very costly when running one experiment takes long time Batch: – k>1 experiments are selected at each iteration – Pros: k times speed-up comparing to sequential approaches – Cons: Can not performs as well as sequential algorithms Introduction to BOConstrained BOBatch BOSchedulingFuture Work 43

Batch Performance (Azimi et.al NIPS 2010) 44 k=5 k=10 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Hybrid Batch Sometimes, the selected points by a given sequential policy at a few consequent steps are independent from each other Size of the batch can change at each time step (Hybrid batch size) 45 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

First Idea (NIPS Workshop 2011) 46 Based on a given prior (blue circles) and an objective function (MEI), x 1 is selected To select the next experiment, x 2, we need, y 1 =f( x 1 ) which is not available The statistics of the samples inside the red circle are expected to change after observing at actual y 1 We set y 1 =M and then EI of the next step is upper bounded If the next selected experiment is outside of the red circle, we claim it is independent from x 1 x1x1 x2x2 x3x3 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Next Very pessimistic to set Y=M and then the speedup is small Can we select the next point based on any estimation without degrading the performance? What is the distance of selected experiments in batch and the actual selected experiments by sequential policy? 47 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

TimeLine Spring 2012: Finishing the Hybrid batch approach Summer 2012: Finding a job and final defend (hopefully ) Introduction to BOConstrained BOBatch BOSchedulingFuture Work 48

Publications 49

And I would like to thank Dr. Xaioli Fern and Dr. Alan Fern 50 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

51 Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Results (1) Introduction to BOConstrained BOBatch BOSchedulingFuture Work Random 52

Results (2) Introduction to BOConstrained BOBatch BOSchedulingFuture Work Sequential 53

Results (3) Introduction to BOConstrained BOBatch BOSchedulingFuture Work EMAX 54

Results (4) Introduction to BOConstrained BOBatch BOSchedulingFuture Work K-means 55

Constrained BO: Results 56 Random Cosines Fuel Cell Real Rosenbrock CMC-MUI Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Constrained BO: Results 57 CN-MEI Cosines Fuel Cell Real Rosenbrock Introduction to BOConstrained BOBatch BOSchedulingFuture Work

Constrained BO: Results 58 CMC-MPI(0.2) Cosines Fuel Cell Real Rosenbrock Introduction to BOConstrained BOBatch BOSchedulingFuture Work

PS Performance Bound is our policy generator at each time step t and state s State s is the current running experiments with their starting time and completed experiments. denotes is the policy switching result where is the base policy selected in the last step The decision by is returned by N independent simulations. is the CPE of policy with error Introduction to BOConstrained BOBatch BOSchedulingFuture Work 59