Adaptive Ambulance Redeployment via Multi-armed Bandits

Slides:



Advertisements
Similar presentations
TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Jan, 23, 2015.
1 Material to Cover  relationship between different types of models  incorrect to round real to integer variables  logical relationship: site selection.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Jaroslaw Kutylowski 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Reliable Broadcasting without Collision Detection… … in.
Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Branch & Bound Algorithms
Planning under Uncertainty
1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.
Neeraj Jaggi ASSISTANT PROFESSOR DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE WICHITA STATE UNIVERSITY 1 Rechargeable Sensor Activation under Temporally.
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
Adaptive Data Collection Strategies for Lifetime-Constrained Wireless Sensor Networks Xueyan Tang Jianliang Xu Sch. of Comput. Eng., Nanyang Technol. Univ.,
1 The Dynamic Vehicle Routing Problem with A-priori Information ROUTE2000 Thursday August 17th 2000 Allan Larsen The Department of Mathematical Modelling,
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
presented by Zümbül Bulut
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
An Inventory-Location Model: Formulation, Solution Algorithm and Computational Results Mark S. Daskin, Collete R. Coullard and Zuo-Jun Max Shen presented.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
A polynomial-time ambulance redeployment policy 1 st International Workshop on Planning of Emergency Services Caroline Jagtenberg R. van der Mei S. Bhulai.
Package Transportation Scheduling Albert Lee Robert Z. Lee.
Machine Learning and Optimization For Traffic and Emergency Resource Management. Milos Hauskrecht Department of Computer Science University of Pittsburgh.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Resource Allocation for E-healthcare Applications
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Upper Confidence Trees for Game AI Chahine Koleejan.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Appointment Systems - a Stochastic and Fluid Approach Michal Penn The William Davidson Faculty of Industrial Engineering and Management Technion - Israel.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.
CHAPTER SEVEN ESTIMATION. 7.1 A Point Estimate: A point estimate of some population parameter is a single value of a statistic (parameter space). For.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Maximizing Lifetime per Unit Cost in Wireless Sensor Networks
Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.
Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
By: Kenny Raharjo 1. Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development.
Basics of Multi-armed Bandit Problems
Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A
Keep the Adversary Guessing: Agent Security by Policy Randomization
Tradeoffs Between Fairness and Accuracy in Machine Learning
Introduction to Spatial Computing CSE 5ISC
Data Driven Resource Allocation for Distributed Learning
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
The Impact of Replacement Granularity on Video Caching
Monte-Carlo Planning:
Feedback-Aware Social Event-Participant Arrangement
Multi-Core Parallel Routing
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
Tuning bandit algorithms in stochastic environments
Announcements Homework 3 due today (grace period through Friday)
Provision of Multimedia Services in based Networks
Replications in Multi-Region Peer-to-peer Systems
Chapter 2: Evaluative Feedback
October 6, 2011 Dr. Itamar Arel College of Engineering
Lecture 3: Environs and Algorithms
Chapter 2: Evaluative Feedback
Dr. Arslan Ornek MATHEMATICAL MODELS
Presentation transcript:

Adaptive Ambulance Redeployment via Multi-armed Bandits Ümitcan Şahin, Veysel Yücesoy, Aykut Koç, Cem Tekin Aselsan Research Center 30th June, 2018

Ambulance Redeployment Setup Proposed Method: via Multi-armed Bandits Overview Motivation Problem Definition Related Work Ambulance Redeployment Setup Proposed Method: via Multi-armed Bandits ARP Formulation Risk Aversion Results

Motivation In Turkey, seperate telephone numbers for emergency services 112 -> Ambulance 155 -> Police 110 -> Fire department etc. A need for a common emergency number system such as 911 A project launced by the Ministry of Health to combine all emergency services under one roof ASELSAN provides the setup for the emergency medical services in Turkey.

Problem Definition Accident occurs Closest idle ambulance is sent Ambulance arrives at accident scene Patient may need transportation How to position ambulances in order to: minimize the arrival times, increase the coverage of demand points. Methods in the literature: Static allocation Dynamic redeployment Learning based

Related Work Static allocation: Each ambulance is sent back to its predetermined base whenever it becomes idle + Computational efficiency and model simplicity - Does not accommodate to dynamically changing parameters such as call distributions, geographics (e.g., road conditions, traffic level, accidents, etc.) Static allocation models: Deterministic: Location set covering model (LSCM) [1] Maximal covering location problem (MCLP) [2] Stochastic: Maximum expected covering location problem (MEXCLP) [3]

Related Work Dynamic deployment: Redeploy ambulances periodically by considering current realizations (e.g., locations of accidents, number of idle ambulances, etc.) + Practical and applicable advanced models - Hard to find the optimal solution space in most cases Dynamic Redeployment Models: Approximate dynamic programming (e.g. Markov Decision Processes) Dynamic adaptations of MEXCLP: Dynamic MEXCLP ADVANTAGES: 24% performance increase in EMS system of Alberta, Canada [4] 33% more calls responded under 10 min. from 2015 to 2016, in Utrecht, the Netherlands [5]

Related Work Dynamic MEXCLP [5]: Given: Optimize: Location of bases (i.e. ambulance waiting locations) Number of available ambulances Expected demand at every location Driving times between locations Optimize: Distribution of idle ambulances over the bases in order to minimize fraction arrivals later than a certain threshold time (patient friendly) Ambulances relocate ONCE it becomes idle (crew friendly)

Related Work Dynamic MEXCLP [5]: Given: Optimize: Location of bases (i.e. ambulance waiting locations) Number of available ambulances Expected demand at every location Driving times between locations Optimize: Distribution of idle ambulances over the bases in order to minimize fraction arrivals later than a certain threshold time (patient friendly) Ambulances relocate ONCE it becomes idle (crew friendly)

Ambulance Redeployment Setup 𝑑 𝑖,𝑗 : Distance between from node i to node j 𝑥 𝑖, 𝑗 𝑡 : Context from node i to node j (context can be any information: traffic states, weather, accidents, etc.)

Proposed Method: Multi-armed Bandits Gambler plays one of the arms of 𝐾 slot machines sequentially. In round 𝑡 he bases its decision to play the arm 𝑎 𝑡 on his past observations. After playing 𝑎 𝑡 , he receives a reward from an unknown distribution 𝑟 𝑡 ~𝐹 𝑎 𝑡 . He can only observe the rewards of the arms he chooses to play. OBJECTIVE: Maximizing the long-term expected total reward max 𝐸 𝑡=1 𝑇 𝑟 𝑡

Proposed Method: Multi-armed Bandits Trade-off between exploration and exploitation: Exploration Exploitation Random Action Best Action Learn more about the best action Take the best action learned so far

Proposed Method: Multi-armed Bandits Challenges in ARP 1- Unknown expected demand at every location Learning-based approach rather than optimization Problem of partial observations Ex: Upper confidence bounds, Thompson sampling, 𝜖-greedy 2- Unknown and stochastic driving times Model traffic states on the roads as a Markov process Time-dependent travel times

ARP Formulation Regret for single ambulance redeployment in T rounds: 𝜇 ∗ : arrival times to calls from the best possible base location (known only by an oracle) 𝜋 𝑡 : location selected by MAB algorithm 𝜋 at round 𝑡 𝑟 𝑡, 𝜋 𝑡 : arrival time to call at round 𝑡 from 𝜋 𝑡 ORACLE: Computable and optimal OBJECTIVE: min 𝜋∈Π 𝑅(𝑇)

ARP Formulation Regret for multiple ambulance redeployment in T rounds: 𝑅 𝑚 𝑇 =E 𝑡=1 𝑇 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) −E 𝑡=1 𝑇 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) ∗ 𝑛 𝑡 : number of available ambulances at round t 𝜋 𝑡,𝑛(𝑡) ∗ : closest ambulance to call dispatched by an oracle (MEXCLP) that knows the true call distributions and traffic states on the roads 𝜋 𝑡,𝑛(𝑡) : closest ambulance to call dispatched by MAB algorithm at round 𝑡 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) : arrival time of the closest ambulance to call at round 𝑡 from 𝜋 𝑡,𝑛(𝑡) ORACLE: NP-hard (combinatorial problem) => regret can only be defined against an oracle which might not be optimal for ARP OBJECTIVE: min 𝜋∈Π 𝑅 𝑚 (𝑇)

Challenges in ARP (Continued) Risk Aversion in ARP Challenges in ARP (Continued) 3- Is minimizing only the expected value of the arrival times enough? Guarantee that each call is to be responded as quickly as possible Minimize the arrival times under worst-case scenarios What happens to the expected arrival times under these guarantees then?

Mean-variance (MV) metric: [6] 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 −𝜌 𝜇 𝑖 Risk Aversion in ARP Mean-variance (MV) metric: [6] 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 −𝜌 𝜇 𝑖 𝜌: risk coefficient 𝜎: standard deviation of reward 𝜇: expected (average) value of reward Modification made for ARP: 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 +𝜌 𝜇 𝑖 Rewards are simply the negative of the arrival times Objective: (1) Minimizing both the variance and (2) the expected value of the arrival times Is there a trade-off between (1) and (2)?

Challenges in ARP (Continued) Risk Aversion in ARP Challenges in ARP (Continued) 4 – Unknown expected demand at every location Unknown 𝜎 and 𝜇 parameters in the MV metric 5 – Only arrival time of the closest ambulance can be observed (i.e., the problem of partial feedback) Distinct 𝜎 𝑖 and 𝜇 𝑖 must be estimated for each bandit arm 𝑖

Risk Aversion in ARP Flow of the algorithm: Beginning: Place ambulance in each location to initiate 𝑛 𝑖 (number of times an ambulance is placed at location 𝑖), 𝑟 𝑖 (list of arrival times from location 𝑖 to calls) While there are idle ambulances: In each round 𝑡, compute the estimations 𝑢 𝑖 , 𝜎 𝑖 using 𝑛 𝑖 and 𝑟 𝑖 up to round 𝑡 Compute 𝑀 𝑉 𝑖 terms using 𝑢 𝑖 and 𝜎 𝑖 for each base location 𝑖, then compute the LCB terms: 𝐿𝐶 𝐵 𝑖 =𝑀 𝑉 𝑖 − 2 𝑙𝑜𝑔 𝑡 𝑛 𝑖 Select the location 𝑖 ∗ =𝑎𝑟𝑔𝑚𝑖 𝑛 𝑖 𝐿𝐶 𝐵 𝑖 for ambulance redeployment Exclude 𝑖 ∗ from the possible base locations, update 𝑛 𝑖 ∗ Dispatch closest ambulances to calls after round 𝑡 and update 𝑟 𝑖 for each dispacthed ambulance until next bandit update (exploration term)

Results – Setup Parameters 625 nodes 20-40 ambulance base locations 4 different equal time intervals in a day 5000 call capacity in a week (Poisson process with 𝜆=2) Time-dependent travel times w.r.t. Markov processes

Results – Traffic Consideration Non-existing traffic on the roads (deterministic driving times) Existing traffic on the roads (stochastic driving times) Response times confidence intervals of the algorithms when 𝑁=20

Results – Risk Aversion True call distributions Taking less risk Standard UCB1 distributions Mean-variance LCB distributions

Results – Risk Aversion

Results – Ambulance Redeployments Algorithms: ARP-UCB1, ARP-TS, ARP-𝜖-greedy

Future Work on ARP via Multi-armed Bandits Driving lower and upper bounds on the regret of multiple ambulance redeployment Combinatorial bandit optimization Trying to find the best arm combination to play (i.e. best ambulance redeployment in a given round -> optimal oracle) Submodular optimization Determining how many ambulances are needed for an approximately (1− 1 3 𝜖)% coverage (trade-off between ambulance redeployment cost and coverage)

References C. Toregas, R. Swain, C. ReVelle, and L. Bergman, “The location of emergency service facilities,” Op. Res., vol. 19, no. 6, pp. 1363–1373, 1971. R. Church and C. R. Velle, “The maximal covering location problem,” Papers Reg. Sci., vol. 32, no. 1, pp. 101–118, 1974. R. Batta, J. M. Dolan, and N. N. Krishnamurthy, “The maximal expected covering location problem: Revisited,” Transportation Sci., vol. 23, no. 4, pp. 277–287, 1989. M. S. Maxwell, S. G. Henderson, and H. Topaloglu, “Ambulance redeployment: An approximate dynamic programming approach,” Winter Sim. Conf., pp. 1850–1860, 2009. C. J. Jagtenberg, S. Bhulai, and R. D. van der Mei, “An efficient heuristic for real-time ambulance redeployment,” Op. Res. Health Care, vol. 4, pp. 27–35, 2015. A. Sani, A. Lazaric, and R. Munos, "Risk-aversion in multi-armed bandits," NIPS. 2012.

Thank you for your attention. Questions?