Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi,

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Niangjun Chen Joint work with Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman

Predictions are crucial for decision making Travel, finance, cloud management …

Predictions are crucial for decision making “The human brain, it is being increasingly argued in the scientific literature, is best viewed as an advanced prediction machine.” Travel, finance, cloud management …

We know how to make predictions

We know how to make predictions But not how to design algorithms to use prediction independent How should an algorithm use predictions if errors are vs correlated

This paper: Online algorithm design with predictions in mind

𝑐 1 𝑐 1 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑐 1 ( 𝑥 1 ) 𝑥 1 𝐹 Cost = 𝑐 1 𝑥 1

Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 𝑐 3 𝑐 2 𝑐 4 𝑐 2 𝑐 2 ( 𝑥 2 ) 𝛽‖ 𝑥 2 − 𝑥 1 ‖ : switching cost 𝑐 2 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1

Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 2 + 𝑐 3 𝑥 3 … 𝛽‖ 𝑥 3 − 𝑥 2 ‖ 𝑐 3 ( 𝑥 3 ) 𝑐 3 𝑐 4 Prediction error 𝑥 1 𝑥 3 𝑥 2 𝐹 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 Cost = 𝑐 1 𝑥 1 +𝛽 𝑥 2 − 𝑥 1 + 𝑐 2 𝑥 2 +𝛽 𝑥 3 − 𝑥 2 + 𝑐 3 𝑥 3 …

Online convex optimization using predictions 𝑥 1 , 𝑦 1 , 𝑥 2 , 𝑦 2 , 𝑥 3 , 𝑦 3 … online Goal: minimize competitive difference cost(𝐴𝐿𝐺) – cost(𝑂𝑃𝑇) min 𝑥 𝑡 ∈𝐹 𝑡 𝑐 𝑥 𝑡 , 𝑦 𝑡 +𝛽‖ 𝑥 𝑡 − 𝑥 𝑡−1 ‖ e.g. online tracking cost 𝑐 𝑥 𝑡 , 𝑦 𝑡 Given prediction of 𝑦 𝑡 at time 𝜏, 𝑦 𝑡|𝜏 convex switching cost Parameterized family Time Information Available Decision 1 𝑦 1|0 𝑦 2|0 𝑦 3|0 … 𝑥 1 2 𝑦 1 𝑦 2|1 𝑦 3|1 𝑥 2 3 𝑦 2 𝑦 3|2 𝑥 3 4 𝑦 3 𝑥 4

Prediction noise model [Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) Realization that algorithm is trying to track Prediction for time 𝑡 given to algorithm at time 𝜏 prediction error

Prediction noise model [Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Per-step noise 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How much uncertainty is there one step ahead? 𝑦 𝑡 − 𝑦 𝑡|𝑡−1 =𝑓(0)𝑒 𝑡 where 𝑒 𝑡 are white, mean zero (unbiased) and 𝑓 0 = 1, 𝔼𝑒 𝑡 2 = 𝜎 2

Prediction noise model [Gan et al 2013] [Chen et al 2014] [Chen et al 2015] Weighting factor 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) How important is the noise at time 𝑡−𝑠 for the prediction of 𝑡? 𝜎 𝑓 0 2 +…+𝑓 𝑠 2 ) 𝑡=𝑠

Prediction noise model [Gan et al 2013] [Chen et al 2014] [Chen et al 2015] 𝑦 𝑡 = 𝑦 𝑡|𝜏 + 𝑠=𝜏+1 𝑡 𝑓 𝑡−𝑠 𝑒(𝑠) prediction error Predictions are “refined” as time moves forward Predictions are more noisy as you look further ahead Prediction errors can be correlated Form of errors matches many classical models Prediction of wide-sense stationary process using Wiener filter Prediction of linear dynamical system using Kalman filter

Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al. 2013] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al. 2013] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al. 2013] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

Lots of applications … Dynamic capacity management in data centers [Gandhi et al. 2012][Lin et al 2013] Power system generation/load scheduling[Lu et al. 2013] Portfolio management [Cover 1991][Boyd et al. 2012] Video streaming [Sen et al. 2000][Liu et al. 2008] Network routing [Bansal et al. 2003][Kodialam et al. 2003] Geographical load balancing [Hindman et al. 2011] [Lin et al. 2012] Visual speech generation [Kim et al. 2015] … A lot of online decision problem can be modeled as an OCO, for example, capacity management of data centers -> number of servers or virtual resource, -> adjust encoding bitrate to network condition, many Sigmetrics papers!

Lots of applications… lots of algorithms Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

Lots of applications… lots of algorithms Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑥 𝑡+2 , 𝑥 𝑡+3 , … 𝑥 𝑡+𝑤+1

Lots of applications… lots of algorithms Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+2|𝑡+1 , 𝑦 𝑡+3|𝑡+1 ,…, 𝑦 𝑡+𝑤+1|𝑡+1 , 𝑦 𝑡+𝑤+2|𝑡+1 , 𝑦 𝑡+𝑤+3|𝑡+1 ,… 𝑦 𝑡+3|𝑡+2 , 𝑦 𝑡+4|𝑡+2 ,…, 𝑦 𝑡+𝑤+2|𝑡+2 , 𝑦 𝑡+𝑤+3|𝑡+2 , 𝑦 𝑡+𝑤+4|𝑡+2 ,… 𝑥 𝑡+3 , 𝑥 𝑡+4 , … 𝑥 𝑡+𝑤+2

Lots of applications… lots of algorithms Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015]

Averaging Fixed Horizon Control Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

Averaging Fixed Horizon Control Fixed Horizon Control (FHC) 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡+𝑤 , 𝑦 𝑡+𝑤+2|𝑡+𝑤 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 𝑥 𝑡+𝑤+1 , 𝑥 𝑡+𝑤+2 ,…, 𝑥 𝑡+2𝑤

Averaging Fixed Horizon Control Average choices of FHC algorithms 𝑥 𝐴𝐹𝐻𝐶 = 1 w 𝑘=1 𝑤 𝑥 𝐹𝐻𝐶 𝑘 𝑤 FHC algorithms 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡 𝑤 𝑥 𝑡+𝑤−1 1 ,…, 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡+𝑤 2 ,…, 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡+𝑤+1 3 ,…, …

Algorithms Using Noisy Prediction Most popular choice by far: Receding Horizon Control (RHC) [Morari et al 1989][Mayne 1990][Rawling et al 2000][Camacho 2013]… Recent suggestion: Averaging Fixed Horizon Control (AFHC) [Lin et al 2012] [Chen et al 2015] [Kim et al 2015] Which algorithm is better? Unclear…

AFHC and RHC have vastly different behavior AFHC is better in worst case under perfect predictions Empirically RHC is better in stochastic case when prediction errors are correlated

This paper: Online algorithm design with predictions in mind How to design algorithm optimal for prediction noise?

Three key design choices 1. How far to look-ahead in making decisions? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… Lookahead 𝑤 steps

Three key design choices 1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1

Three key design choices 1. How far to look-ahead in making decisions? 2. How many actions to commit? 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,…, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 commits 𝑣 steps

Three key design choices 1. How far to look-ahead in making decisions? 2. How many actions to commit? Our focus: what is the optimal commitment level given the structure of prediction noise? 3. How to aggregate action plans? 𝑥 𝑡−2 1 , 𝑥 𝑡−1 1 , 𝑥 𝑡 1 …, 𝑥 𝑡+𝑤−2 1 𝑥 𝑡−1 2 , 𝑥 𝑡 2 , 𝑥 𝑡+4 2 …, 𝑥 𝑡+𝑤−1 2 𝑥 𝑡 3 , 𝑥 𝑡+4 3 , 𝑥 𝑡+5 3 …, 𝑥 𝑡+𝑤 3 𝑥 𝑡 =𝑔( 𝑥 𝑡 1 , 𝑥 𝑡 2 , 𝑥 𝑡 3 ) Key: commitment balances switching cost and prediction errors

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑥 𝑡+1 , 𝑥 𝑡+2 ,…, 𝑥 𝑡+𝑣 , 𝑥 𝑡+𝑣+1 , …, 𝑥 𝑡+𝑤 =argmin 𝑠=𝑡+1 𝑡+𝑤 𝑐( 𝑥 𝑡 , 𝑦 𝑡|𝑠 )+𝛽 𝑥 𝑡 − 𝑥 𝑡−1 1 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 ,)

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑤 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑥 𝑡+𝑣+1 , 𝑥 𝑡+𝑣+2 ,…, 𝑥 𝑡+2𝑣 , 𝑥 𝑡+2𝑣+1 ,…, 𝑥 𝑡+𝑤+𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 )

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 )

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) 𝑣 FHC(v) algorithms ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…)

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘

Committed Horizon Control FHC with limited commitment 𝑣, for 𝑡≡𝑘 mod 𝑣 𝑦 𝑡+1|𝑡 , 𝑦 𝑡+2|𝑡 ,… 𝑦 𝑡+𝑣|𝑡 , 𝑦 𝑡+𝑣+1|𝑡 , …, 𝑦 𝑡+𝑤|𝑡 , 𝑦 𝑡+𝑤+1|𝑡 , 𝑦 𝑡+𝑤+2|𝑡 ,… 𝑦 𝑡+𝑣+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣|𝑡+𝑣 , 𝑦 𝑡+2𝑣+1|𝑡+𝑣 …,𝑦 𝑡+𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+𝑣+𝑤+1|𝑡+𝑣 , … 𝑦 𝑡+2𝑣+1|𝑡+2𝑣 , … 𝑦 𝑡+3𝑣|𝑡+2𝑣 , …,𝑦 𝑡+2𝑣+𝑤|𝑡+𝑣 , 𝑦 𝑡+2𝑣+𝑤+1|𝑡+2𝑣 , … 𝑥 𝑡+2𝑣+1 , 𝑥 𝑡+2𝑣+2 ,…, 𝑥 𝑡+3𝑣 , 𝑥 𝑡+3𝑣+1 ,…, 𝑥 𝑡+𝑤+2𝑣 𝑥 (1) =(…, 𝑥 𝑡+1 1 , 𝑥 𝑡+2 1 ,…, 𝑥 𝑡+𝑣 1 , 𝑥 𝑡+𝑣+1 1 , 𝑥 𝑡+𝑣+2 1 ,…, 𝑥 𝑡+2𝑣 1 , 𝑥 𝑡+2𝑣+1 1 , 𝑥 𝑡+2𝑣+2 1 ,…, 𝑥 𝑡+3𝑣 1 ,…) ⋮ 𝑥 (𝑘) =(…, 𝑥 𝑡+1 𝑘 , 𝑥 𝑡+2 𝑘 ,…, 𝑥 𝑡+𝑣 𝑘 , 𝑥 𝑡+𝑣+1 𝑘 , 𝑥 𝑡+𝑣+2 𝑘 ,…, 𝑥 𝑡+2𝑣 𝑘 , 𝑥 𝑡+2𝑣+1 𝑘 , 𝑥 𝑡+2𝑣+2 𝑘 ,…, 𝑥 𝑡+3𝑣 𝑘 ,…) 𝑣 FHC(v) algorithms ⋮ 𝑥 (𝑣) =(…, 𝑥 𝑡+1 𝑣 , 𝑥 𝑡+2 𝑣 ,…, 𝑥 𝑡+𝑣 𝑣 , 𝑥 𝑡+𝑣+1 𝑣 , 𝑥 𝑡+𝑣+2 𝑣 ,…, 𝑥 𝑡+2𝑣 𝑣 , 𝑥 𝑡+2𝑣+1 𝑣 , 𝑥 𝑡+2𝑣+2 𝑣 ,…, 𝑥 𝑡+3𝑣 𝑣 ,…) 𝑥 𝐶𝐻𝐶 𝑡 = 1 𝑣 𝑘=1 𝑣 𝑥 𝑡 𝑘 𝑣=1 RHC, 𝑣=𝑤 AFHC

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Competitive difference

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 ? Commitment level 𝑣

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 𝑥 1 − 𝑥 2 ≤𝐷, ∀ 𝑥 1 , 𝑥 2 ∈𝐹 Due to switching cost

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

Main Result 𝑐 𝑥, 𝑦 1 −𝑐 𝑥, 𝑦 2 ≤𝐺 𝑦 1 − 𝑦 2 𝛼 , ∀𝑥, 𝑦 1 , 𝑦 2 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

Prediction error k-steps away Main Result 𝑓 𝑘 2 ≜𝐄 𝑦 𝑡+𝑘 − 𝑦 𝑡+𝑘|𝑡 2 = 𝜎 2 𝑠=1 𝑘 𝑓 𝑠 2 Prediction error k-steps away Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

Main Result 𝑘 Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 Due to switching cost Due to prediction error ? Commitment level 𝑣 Key: choose commitment level 𝑣 to balance these two terms

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 e.g. i.i.d. noise 𝑓 𝑠 = 1, 𝑠=0 0, 𝑠>0 = 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 Decreasing function of 𝑣 AFHC is best when noise is i.i.d

i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0

i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤

i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤

i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1

Optimal commitment level depends on prediction noise structure i.i.d. prediction noise 𝑓 𝑠 = 1, 𝑠=0 0,𝑠>0 Long range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 Short range correlated 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 Exponentially decaying 𝑓 𝑠 = 𝑎 𝑠 , 𝑎<1

More detail: long-range correlated noise Theorem If prediction noise is long-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿>𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝛼 2𝑤 1+ 𝛼 2 RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w Relative importance of prediction error and switching cost Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝛼+2 −4𝐺𝑇 𝜎 𝛼 𝑣 𝛼+2 + 2 3+ 𝛼 2 𝐺𝑇 𝜎 𝛼 𝛼+2 𝑣 𝛼 2 RHC AFHC

More detail: short-range correlated noise Theorem If prediction noise is short-range correlated, 𝑓 𝑠 = 1, 𝑠≤𝐿 0,𝑠>𝐿 , 𝐿≤𝑤 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 >𝐻(𝐿) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 2 𝛼+2 CHC is optimal with 𝑣∈(1,𝑤) o/w 1 𝛼+2 𝐿+1 𝛼 2 𝛼𝐿−2 +1 Intermediate Prediction error dominant Switching cost dominant RHC AFHC 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 +2𝐺𝑇 𝜎 𝛼 𝐿+1 𝛼/2 − 2𝐺𝑇 𝜎 𝛼 𝑣 𝐻(𝐿)

More detail: exponentially decaying noise Theorem If prediction noise is exponentially decaying, 𝑓 𝑠 = 𝑎 𝑠 , 0<𝑎<1 AFHC is optimal if 𝛽D 𝐺 𝜎 𝛼 > 𝑎 2 2(1− 𝑎 2 ) RHC is optimal if 𝛽𝐷 𝐺 𝜎 𝛼 < 𝑎 2 2(1+𝑎) CHC is optimal with 𝑣∈(1,𝑤) o/w Prediction error dominant Switching cost dominant Intermediate 𝑣 ∗ =argmin 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝜎 1− 𝑎 2 − 𝑎 2 1− 𝑎 2𝑣 𝐺𝑇𝜎 𝑣 1− 𝑎 2 2 RHC AFHC

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 We can use prediction error structure to guide design of online algorithm

Main Result Theorem For 𝑐 that is 𝛼-Hölder continuous in the second argument and feasible set 𝐹 is bounded, 𝐄cost 𝐶𝐻𝐶 −𝐄cost 𝑂𝑃𝑇 ≤ 2𝑇𝛽𝐷 𝑣 + 2𝐺𝑇 𝑣 𝑘=1 𝑣 𝑓 𝑘 𝛼 =: 𝑉 Competitive difference holds with high probability 𝐏(cost 𝐶𝐻𝐶 −cost 𝑂𝑃𝑇 >𝑉+𝑢)> exp − 𝑢 2 𝐹(𝑣)

Conclusion Design of optimal algorithm depends on structure of prediction error This talk: OCO with prediction “Commitment” should be optimal to prediction noise Future: can we extend this framework to other online problems?