Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi,

Similar presentations


Presentation on theme: "Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi,"— Presentation transcript:

1 Using Predictions in Online Optimization: Looking Forward with an Eye on the Past
Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman

2 Predictions are crucial for decision making
Travel, finance, cloud management …

3 Predictions are crucial for decision making
“The human brain, it is being increasingly argued in the scientific literature, is best viewed as an advanced prediction machine.” Travel, finance, cloud management …

4 We know how to make predictions

5 We know how to make predictions
But not how to design algorithms to use prediction independent How should an algorithm use predictions if errors are vs correlated

6 This paper: Online algorithm design with predictions in mind

7 𝑐 1 𝑐 1 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑐 1 ( 𝑥 1 ) 𝑥 1 𝐹 Cost = 𝑐 1 𝑥 1

8 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 𝑐 3 𝑐 2 𝑐 4 𝑐 2
𝑐 2 ( 𝑥 2 ) 𝛽‖ 𝑥 2 − 𝑥 1 ‖ : switching cost 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1

9 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 2 + 𝑐 3 𝑥 3 …
𝛽‖ 𝑥 3 − 𝑥 2 ‖ 𝑐 3 ( 𝑥 3 ) 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 3 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 𝑐 3 𝑥 3 …

10 Online convex optimization using predictions
𝑥 1 , 𝑦 1 , 𝑥 2 , 𝑦 2 , 𝑥 3 , 𝑦 3 … online Goal: minimize competitive difference cost(𝐴𝐿𝐺) – cost(𝑂𝑃𝑇) min 𝑥 𝑡 ∈𝐹 𝑡 𝑐 𝑥 𝑡 , 𝑦 𝑡 +𝛽‖ 𝑥 𝑡 − 𝑥 𝑡−1 ‖ e.g. online tracking cost 𝑐 𝑥 𝑡 , 𝑦 𝑡 Given prediction of 𝑦 𝑡 at time 𝜏, 𝑦 𝑡|𝜏 convex switching cost Parameterized family Time Information Available Decision 1 𝑦 1|0 𝑦 2|0 𝑦 3|0 𝑥 1 2 𝑦 1 𝑦 2|1 𝑦 3|1 𝑥 2 3 𝑦 2 𝑦 3|2 𝑥 3 4 𝑦 3 𝑥 4

11 Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) Realization that algorithm is trying to track Prediction for time 𝑡 given to algorithm at time 𝜏 prediction error

12 Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Per-step noise 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How much uncertainty is there one step ahead? 𝑦 𝑡 − 𝑦 𝑡|𝑡−1 =𝑓(0)𝑒 𝑡 where 𝑒 𝑡 are white, mean zero (unbiased) and 𝑓 0 = 1, 𝔼𝑒 𝑡 2 = 𝜎 2

13 Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Weighting factor 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How important is the noise at time 𝑡−𝑠 for the prediction of 𝑡? 𝜎 𝑓 …+𝑓 𝑠 2 ) 𝑡=𝑠

14 Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) prediction error Predictions are “refined” as time moves forward Predictions are more noisy as you look further ahead Prediction errors can be correlated Form of errors matches many classical models Prediction of wide-sense stationary process using Wiener filter Prediction of linear dynamical system using Kalman filter

15 Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

16 Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

17 Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

18 Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

19 Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

20 Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑥 𝑡+2 , 𝑥 𝑡+3 , … 𝑥 𝑡+𝑤+1

21 Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑦 𝑡+3|𝑡+2 , 𝑦 𝑡+4|𝑡+2 ,…, 𝑦 𝑡+𝑤+2|𝑡+2 , 𝑦 𝑡+𝑤+3|𝑡+2 , 𝑦 𝑡+𝑤+4|𝑡+2 ,… 𝑥 𝑡+3 , 𝑥 𝑡+4 , … 𝑥 𝑡+𝑤+2

22 Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015]

23 Averaging Fixed Horizon Control
Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

24 Averaging Fixed Horizon Control
Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 𝑥 𝑡+𝑤+1 , 𝑥 𝑡+𝑤+2 ,…, 𝑥 𝑡+2𝑤

25 Averaging Fixed Horizon Control
Average choices of FHC algorithms 𝑥 𝐴𝐹𝐻𝐶 = 1 w 𝑘=1 𝑤 𝑥 𝐹𝐻𝐶 𝑘 𝑤 FHC algorithms 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡 𝑤 𝑥 𝑡+𝑤−1 1 ,…, 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡+𝑤 2 ,…, 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡+𝑤+1 3 ,…,

26 Algorithms Using Noisy Prediction
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015] Which algorithm is better? Unclear…

27 AFHC and RHC have vastly different behavior
AFHC is better in worst case under perfect predictions Empirically RHC is better in stochastic case when prediction errors are correlated

28 This paper: Online algorithm design with predictions in mind
How to design algorithm optimal for prediction noise?

29 Three key design choices
1. How far to look-ahead in making decisions? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… Lookahead 𝑤 steps

30 Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

31 Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 commits 𝑣 steps

32 Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? Our focus: what is the optimal commitment level given the structure of prediction noise? 3. How to aggregate action plans? 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡 =𝑔( 𝑥 𝑡 1 , 𝑥 𝑡 2 , 𝑥 𝑡 3 ) Key: commitment balances switching cost and prediction errors

33 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑣 , 𝑥 𝑡+𝑣+1 , …, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 ,)

34 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑥 𝑡+𝑣+1 , 𝑥 𝑡+𝑣+2 ,…, 𝑥 𝑡+2𝑣 , 𝑥 𝑡+2𝑣+1 ,…, 𝑥 𝑡+𝑤+𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 )

35 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 )

36 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)

37 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) 𝑣 FHC(v) algorithms 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)

38 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘

39 Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘 𝑣= RHC, 𝑣=𝑤 AFHC

40 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼

41 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Competitive difference

42 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 ? Commitment level 𝑣

43 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost

44 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 𝑥 1 − 𝑥 2 ≤𝐷, ∀ 𝑥 1 , 𝑥 2 ∈𝐹 Due to switching cost

45 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

46 Main Result 𝑐 𝑥, 𝑦 1 −𝑐 𝑥, 𝑦 2 ≤𝐺 𝑦 1 − 𝑦 2 𝛼 , ∀𝑥, 𝑦 1 , 𝑦 2 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

47 Prediction error k-steps away
Main Result 𝑓 𝑘 2 ≜𝐄 𝑦 𝑡+𝑘 − 𝑦 𝑡+𝑘|𝑡 2 = 𝜎 2 𝑠=1 𝑘 𝑓 𝑠 2 Prediction error k-steps away Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

48 Main Result 𝑘 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

49 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣

50 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣 Key: choose commitment level 𝑣 to balance these two terms

51 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 e.g. i.i.d. noise 𝑓 𝑠 = 1, 𝑠=0 0, 𝑠>0 = 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 Decreasing function of 𝑣 AFHC is best when noise is i.i.d

52 i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0

53 i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤

54 i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤

55 i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1

56 Optimal commitment level depends on prediction noise structure
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1

57 More detail: long-range correlated noise
Theorem If prediction noise is long-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝛼 2𝑤 1+ 𝛼 2 RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w Relative importance of prediction error and switching cost Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝛼+2 −4𝐺𝑇 𝜎 𝛼 𝑣 𝛼 𝛼 2 𝐺𝑇 𝜎 𝛼 𝛼+2 𝑣 𝛼 2 RHC AFHC

58 More detail: short-range correlated noise
Theorem If prediction noise is short-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝐻(𝐿) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w 1 𝛼+2 𝐿+1 𝛼 2 𝛼𝐿−2 +1 Intermediate Prediction error dominant Switching cost dominant RHC AFHC 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 𝐿+1 𝛼/2 − 2𝐺𝑇 𝜎 𝛼 𝑣 𝐻(𝐿)

59 More detail: exponentially decaying noise
Theorem If prediction noise is exponentially decaying, 𝑓 𝑠 = 𝑎 𝑠 , 0<𝑎<1 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 > 𝑎 2 2(1− 𝑎 2 ) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 𝑎 2 2(1+𝑎) CHC is optimal with 𝑣∈(1,𝑤) o/w Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝜎 1− 𝑎 2 − 𝑎 2 1− 𝑎 2𝑣 𝐺𝑇𝜎 𝑣 1− 𝑎 2 2 RHC AFHC

60 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 We can use prediction error structure to guide design of online algorithm

61 Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 =: 𝑉 Competitive difference holds with high probability 𝐏(cost 𝐶𝐻𝐶 −cost 𝑂𝑃𝑇 >𝑉+𝑢)> exp − 𝑢 2 𝐹(𝑣)

62 Conclusion Design of optimal algorithm depends on
structure of prediction error This talk: OCO with prediction “Commitment” should be optimal to prediction noise Future: can we extend this framework to other online problems?


Download ppt "Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi,"

Similar presentations


Ads by Google