Download presentation
Presentation is loading. Please wait.
Published byRudolf Osborne Modified over 6 years ago
1
Using Predictions in Online Optimization: Looking Forward with an Eye on the Past
Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman
2
Predictions are crucial for decision making
Travel, finance, cloud management …
3
Predictions are crucial for decision making
“The human brain, it is being increasingly argued in the scientific literature, is best viewed as an advanced prediction machine.” Travel, finance, cloud management …
4
We know how to make predictions
5
We know how to make predictions
But not how to design algorithms to use prediction independent How should an algorithm use predictions if errors are vs correlated
6
This paper: Online algorithm design with predictions in mind
7
𝑐 1 𝑐 1 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑐 1 ( 𝑥 1 ) 𝑥 1 𝐹 Cost = 𝑐 1 𝑥 1
8
Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 𝑐 3 𝑐 2 𝑐 4 𝑐 2
𝑐 2 ( 𝑥 2 ) 𝛽‖ 𝑥 2 − 𝑥 1 ‖ : switching cost 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1
9
Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 2 + 𝑐 3 𝑥 3 …
𝛽‖ 𝑥 3 − 𝑥 2 ‖ 𝑐 3 ( 𝑥 3 ) 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 3 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 𝑐 3 𝑥 3 …
10
Online convex optimization using predictions
𝑥 1 , 𝑦 1 , 𝑥 2 , 𝑦 2 , 𝑥 3 , 𝑦 3 … online Goal: minimize competitive difference cost(𝐴𝐿𝐺) – cost(𝑂𝑃𝑇) min 𝑥 𝑡 ∈𝐹 𝑡 𝑐 𝑥 𝑡 , 𝑦 𝑡 +𝛽‖ 𝑥 𝑡 − 𝑥 𝑡−1 ‖ e.g. online tracking cost 𝑐 𝑥 𝑡 , 𝑦 𝑡 Given prediction of 𝑦 𝑡 at time 𝜏, 𝑦 𝑡|𝜏 convex switching cost Parameterized family Time Information Available Decision 1 𝑦 1|0 𝑦 2|0 𝑦 3|0 … 𝑥 1 2 𝑦 1 𝑦 2|1 𝑦 3|1 𝑥 2 3 𝑦 2 𝑦 3|2 𝑥 3 4 𝑦 3 𝑥 4
11
Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) Realization that algorithm is trying to track Prediction for time 𝑡 given to algorithm at time 𝜏 prediction error
12
Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Per-step noise 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How much uncertainty is there one step ahead? 𝑦 𝑡 − 𝑦 𝑡|𝑡−1 =𝑓(0)𝑒 𝑡 where 𝑒 𝑡 are white, mean zero (unbiased) and 𝑓 0 = 1, 𝔼𝑒 𝑡 2 = 𝜎 2
13
Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Weighting factor 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How important is the noise at time 𝑡−𝑠 for the prediction of 𝑡? 𝜎 𝑓 …+𝑓 𝑠 2 ) 𝑡=𝑠
14
Prediction noise model
[Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) prediction error Predictions are “refined” as time moves forward Predictions are more noisy as you look further ahead Prediction errors can be correlated Form of errors matches many classical models Prediction of wide-sense stationary process using Wiener filter Prediction of linear dynamical system using Kalman filter
15
Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!
16
Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!
17
Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!
18
Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al ] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!
19
Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1
20
Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑥 𝑡+2 , 𝑥 𝑡+3 , … 𝑥 𝑡+𝑤+1
21
Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑦 𝑡+3|𝑡+2 , 𝑦 𝑡+4|𝑡+2 ,…, 𝑦 𝑡+𝑤+2|𝑡+2 , 𝑦 𝑡+𝑤+3|𝑡+2 , 𝑦 𝑡+𝑤+4|𝑡+2 ,… 𝑥 𝑡+3 , 𝑥 𝑡+4 , … 𝑥 𝑡+𝑤+2
22
Lots of applications… lots of algorithms
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015]
23
Averaging Fixed Horizon Control
Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1
24
Averaging Fixed Horizon Control
Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 𝑥 𝑡+𝑤+1 , 𝑥 𝑡+𝑤+2 ,…, 𝑥 𝑡+2𝑤
25
Averaging Fixed Horizon Control
Average choices of FHC algorithms 𝑥 𝐴𝐹𝐻𝐶 = 1 w 𝑘=1 𝑤 𝑥 𝐹𝐻𝐶 𝑘 𝑤 FHC algorithms 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡 𝑤 𝑥 𝑡+𝑤−1 1 ,…, 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡+𝑤 2 ,…, 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡+𝑤+1 3 ,…, …
26
Algorithms Using Noisy Prediction
Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015] Which algorithm is better? Unclear…
27
AFHC and RHC have vastly different behavior
AFHC is better in worst case under perfect predictions Empirically RHC is better in stochastic case when prediction errors are correlated
28
This paper: Online algorithm design with predictions in mind
How to design algorithm optimal for prediction noise?
29
Three key design choices
1. How far to look-ahead in making decisions? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… Lookahead 𝑤 steps
30
Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1
31
Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 commits 𝑣 steps
32
Three key design choices
1. How far to look-ahead in making decisions? 2. How many actions to commit? Our focus: what is the optimal commitment level given the structure of prediction noise? 3. How to aggregate action plans? 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡 =𝑔( 𝑥 𝑡 1 , 𝑥 𝑡 2 , 𝑥 𝑡 3 ) Key: commitment balances switching cost and prediction errors
33
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑣 , 𝑥 𝑡+𝑣+1 , …, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 ,)
34
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑥 𝑡+𝑣+1 , 𝑥 𝑡+𝑣+2 ,…, 𝑥 𝑡+2𝑣 , 𝑥 𝑡+2𝑣+1 ,…, 𝑥 𝑡+𝑤+𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 )
35
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 )
36
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)
37
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) 𝑣 FHC(v) algorithms ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)
38
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘
39
Committed Horizon Control
FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms ⋮ 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘 𝑣= RHC, 𝑣=𝑤 AFHC
40
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼
41
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Competitive difference
42
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 ? Commitment level 𝑣
43
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost
44
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 𝑥 1 − 𝑥 2 ≤𝐷, ∀ 𝑥 1 , 𝑥 2 ∈𝐹 Due to switching cost
45
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error
46
Main Result 𝑐 𝑥, 𝑦 1 −𝑐 𝑥, 𝑦 2 ≤𝐺 𝑦 1 − 𝑦 2 𝛼 , ∀𝑥, 𝑦 1 , 𝑦 2 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error
47
Prediction error k-steps away
Main Result 𝑓 𝑘 2 ≜𝐄 𝑦 𝑡+𝑘 − 𝑦 𝑡+𝑘|𝑡 2 = 𝜎 2 𝑠=1 𝑘 𝑓 𝑠 2 Prediction error k-steps away Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error
48
Main Result 𝑘 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error
49
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣
50
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣 Key: choose commitment level 𝑣 to balance these two terms
51
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 e.g. i.i.d. noise 𝑓 𝑠 = 1, 𝑠=0 0, 𝑠>0 = 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 Decreasing function of 𝑣 AFHC is best when noise is i.i.d
52
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0
53
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤
54
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤
55
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1
56
Optimal commitment level depends on prediction noise structure
i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1
57
More detail: long-range correlated noise
Theorem If prediction noise is long-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝛼 2𝑤 1+ 𝛼 2 RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w Relative importance of prediction error and switching cost Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝛼+2 −4𝐺𝑇 𝜎 𝛼 𝑣 𝛼 𝛼 2 𝐺𝑇 𝜎 𝛼 𝛼+2 𝑣 𝛼 2 RHC AFHC
58
More detail: short-range correlated noise
Theorem If prediction noise is short-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝐻(𝐿) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w 1 𝛼+2 𝐿+1 𝛼 2 𝛼𝐿−2 +1 Intermediate Prediction error dominant Switching cost dominant RHC AFHC 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 𝐿+1 𝛼/2 − 2𝐺𝑇 𝜎 𝛼 𝑣 𝐻(𝐿)
59
More detail: exponentially decaying noise
Theorem If prediction noise is exponentially decaying, 𝑓 𝑠 = 𝑎 𝑠 , 0<𝑎<1 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 > 𝑎 2 2(1− 𝑎 2 ) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 𝑎 2 2(1+𝑎) CHC is optimal with 𝑣∈(1,𝑤) o/w Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝜎 1− 𝑎 2 − 𝑎 2 1− 𝑎 2𝑣 𝐺𝑇𝜎 𝑣 1− 𝑎 2 2 RHC AFHC
60
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 We can use prediction error structure to guide design of online algorithm
61
Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 =: 𝑉 Competitive difference holds with high probability 𝐏(cost 𝐶𝐻𝐶 −cost 𝑂𝑃𝑇 >𝑉+𝑢)> exp − 𝑢 2 𝐹(𝑣)
62
Conclusion Design of optimal algorithm depends on
structure of prediction error This talk: OCO with prediction “Commitment” should be optimal to prediction noise Future: can we extend this framework to other online problems?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.