Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Ambulance Redeployment via Multi-armed Bandits

Similar presentations


Presentation on theme: "Adaptive Ambulance Redeployment via Multi-armed Bandits"— Presentation transcript:

1 Adaptive Ambulance Redeployment via Multi-armed Bandits
Ümitcan Şahin, Veysel Yücesoy, Aykut Koç, Cem Tekin Aselsan Research Center 30th June, 2018

2 Ambulance Redeployment Setup Proposed Method: via Multi-armed Bandits
Overview Motivation Problem Definition Related Work Ambulance Redeployment Setup Proposed Method: via Multi-armed Bandits ARP Formulation Risk Aversion Results

3 Motivation In Turkey, seperate telephone numbers for emergency services 112 -> Ambulance 155 -> Police 110 -> Fire department etc. A need for a common emergency number system such as 911 A project launced by the Ministry of Health to combine all emergency services under one roof ASELSAN provides the setup for the emergency medical services in Turkey.

4 Problem Definition Accident occurs Closest idle ambulance is sent
Ambulance arrives at accident scene Patient may need transportation How to position ambulances in order to: minimize the arrival times, increase the coverage of demand points. Methods in the literature: Static allocation Dynamic redeployment Learning based

5 Related Work Static allocation: Each ambulance is sent back to its predetermined base whenever it becomes idle + Computational efficiency and model simplicity - Does not accommodate to dynamically changing parameters such as call distributions, geographics (e.g., road conditions, traffic level, accidents, etc.) Static allocation models: Deterministic: Location set covering model (LSCM) [1] Maximal covering location problem (MCLP) [2] Stochastic: Maximum expected covering location problem (MEXCLP) [3]

6 Related Work Dynamic deployment: Redeploy ambulances periodically by considering current realizations (e.g., locations of accidents, number of idle ambulances, etc.) + Practical and applicable advanced models - Hard to find the optimal solution space in most cases Dynamic Redeployment Models: Approximate dynamic programming (e.g. Markov Decision Processes) Dynamic adaptations of MEXCLP: Dynamic MEXCLP ADVANTAGES: 24% performance increase in EMS system of Alberta, Canada [4] 33% more calls responded under 10 min. from 2015 to 2016, in Utrecht, the Netherlands [5]

7 Related Work Dynamic MEXCLP [5]: Given: Optimize:
Location of bases (i.e. ambulance waiting locations) Number of available ambulances Expected demand at every location Driving times between locations Optimize: Distribution of idle ambulances over the bases in order to minimize fraction arrivals later than a certain threshold time (patient friendly) Ambulances relocate ONCE it becomes idle (crew friendly)

8 Related Work Dynamic MEXCLP [5]: Given: Optimize:
Location of bases (i.e. ambulance waiting locations) Number of available ambulances Expected demand at every location Driving times between locations Optimize: Distribution of idle ambulances over the bases in order to minimize fraction arrivals later than a certain threshold time (patient friendly) Ambulances relocate ONCE it becomes idle (crew friendly)

9 Ambulance Redeployment Setup
𝑑 𝑖,𝑗 : Distance between from node i to node j 𝑥 𝑖, 𝑗 𝑡 : Context from node i to node j (context can be any information: traffic states, weather, accidents, etc.)

10 Proposed Method: Multi-armed Bandits
Gambler plays one of the arms of 𝐾 slot machines sequentially. In round 𝑡 he bases its decision to play the arm 𝑎 𝑡 on his past observations. After playing 𝑎 𝑡 , he receives a reward from an unknown distribution 𝑟 𝑡 ~𝐹 𝑎 𝑡 . He can only observe the rewards of the arms he chooses to play. OBJECTIVE: Maximizing the long-term expected total reward max 𝐸 𝑡=1 𝑇 𝑟 𝑡

11 Proposed Method: Multi-armed Bandits
Trade-off between exploration and exploitation: Exploration Exploitation Random Action Best Action Learn more about the best action Take the best action learned so far

12 Proposed Method: Multi-armed Bandits
Challenges in ARP 1- Unknown expected demand at every location Learning-based approach rather than optimization Problem of partial observations Ex: Upper confidence bounds, Thompson sampling, 𝜖-greedy 2- Unknown and stochastic driving times Model traffic states on the roads as a Markov process Time-dependent travel times

13 ARP Formulation Regret for single ambulance redeployment in T rounds:
𝜇 ∗ : arrival times to calls from the best possible base location (known only by an oracle) 𝜋 𝑡 : location selected by MAB algorithm 𝜋 at round 𝑡 𝑟 𝑡, 𝜋 𝑡 : arrival time to call at round 𝑡 from 𝜋 𝑡 ORACLE: Computable and optimal OBJECTIVE: min 𝜋∈Π 𝑅(𝑇)

14 ARP Formulation Regret for multiple ambulance redeployment in T rounds: 𝑅 𝑚 𝑇 =E 𝑡=1 𝑇 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) −E 𝑡=1 𝑇 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) ∗ 𝑛 𝑡 : number of available ambulances at round t 𝜋 𝑡,𝑛(𝑡) ∗ : closest ambulance to call dispatched by an oracle (MEXCLP) that knows the true call distributions and traffic states on the roads 𝜋 𝑡,𝑛(𝑡) : closest ambulance to call dispatched by MAB algorithm at round 𝑡 𝑟 𝑡, 𝜋 𝑡,𝑛(𝑡) : arrival time of the closest ambulance to call at round 𝑡 from 𝜋 𝑡,𝑛(𝑡) ORACLE: NP-hard (combinatorial problem) => regret can only be defined against an oracle which might not be optimal for ARP OBJECTIVE: min 𝜋∈Π 𝑅 𝑚 (𝑇)

15 Challenges in ARP (Continued)
Risk Aversion in ARP Challenges in ARP (Continued) 3- Is minimizing only the expected value of the arrival times enough? Guarantee that each call is to be responded as quickly as possible Minimize the arrival times under worst-case scenarios What happens to the expected arrival times under these guarantees then?

16 Mean-variance (MV) metric: [6] 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 −𝜌 𝜇 𝑖
Risk Aversion in ARP Mean-variance (MV) metric: [6] 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 −𝜌 𝜇 𝑖 𝜌: risk coefficient 𝜎: standard deviation of reward 𝜇: expected (average) value of reward Modification made for ARP: 𝑀 𝑉 𝑖 = 𝜎 𝑖 2 +𝜌 𝜇 𝑖 Rewards are simply the negative of the arrival times Objective: (1) Minimizing both the variance and (2) the expected value of the arrival times Is there a trade-off between (1) and (2)?

17 Challenges in ARP (Continued)
Risk Aversion in ARP Challenges in ARP (Continued) 4 – Unknown expected demand at every location Unknown 𝜎 and 𝜇 parameters in the MV metric 5 – Only arrival time of the closest ambulance can be observed (i.e., the problem of partial feedback) Distinct 𝜎 𝑖 and 𝜇 𝑖 must be estimated for each bandit arm 𝑖

18 Risk Aversion in ARP Flow of the algorithm: Beginning:
Place ambulance in each location to initiate 𝑛 𝑖 (number of times an ambulance is placed at location 𝑖), 𝑟 𝑖 (list of arrival times from location 𝑖 to calls) While there are idle ambulances: In each round 𝑡, compute the estimations 𝑢 𝑖 , 𝜎 𝑖 using 𝑛 𝑖 and 𝑟 𝑖 up to round 𝑡 Compute 𝑀 𝑉 𝑖 terms using 𝑢 𝑖 and 𝜎 𝑖 for each base location 𝑖, then compute the LCB terms: 𝐿𝐶 𝐵 𝑖 =𝑀 𝑉 𝑖 − 2 𝑙𝑜𝑔 𝑡 𝑛 𝑖 Select the location 𝑖 ∗ =𝑎𝑟𝑔𝑚𝑖 𝑛 𝑖 𝐿𝐶 𝐵 𝑖 for ambulance redeployment Exclude 𝑖 ∗ from the possible base locations, update 𝑛 𝑖 ∗ Dispatch closest ambulances to calls after round 𝑡 and update 𝑟 𝑖 for each dispacthed ambulance until next bandit update (exploration term)

19 Results – Setup Parameters
625 nodes 20-40 ambulance base locations 4 different equal time intervals in a day 5000 call capacity in a week (Poisson process with 𝜆=2) Time-dependent travel times w.r.t. Markov processes

20 Results – Traffic Consideration
Non-existing traffic on the roads (deterministic driving times) Existing traffic on the roads (stochastic driving times) Response times confidence intervals of the algorithms when 𝑁=20

21 Results – Risk Aversion
True call distributions Taking less risk Standard UCB1 distributions Mean-variance LCB distributions

22 Results – Risk Aversion

23 Results – Ambulance Redeployments
Algorithms: ARP-UCB1, ARP-TS, ARP-𝜖-greedy

24 Future Work on ARP via Multi-armed Bandits
Driving lower and upper bounds on the regret of multiple ambulance redeployment Combinatorial bandit optimization Trying to find the best arm combination to play (i.e. best ambulance redeployment in a given round -> optimal oracle) Submodular optimization Determining how many ambulances are needed for an approximately (1− 1 3 𝜖)% coverage (trade-off between ambulance redeployment cost and coverage)

25 References C. Toregas, R. Swain, C. ReVelle, and L. Bergman, “The location of emergency service facilities,” Op. Res., vol. 19, no. 6, pp. 1363–1373, 1971. R. Church and C. R. Velle, “The maximal covering location problem,” Papers Reg. Sci., vol. 32, no. 1, pp. 101–118, 1974. R. Batta, J. M. Dolan, and N. N. Krishnamurthy, “The maximal expected covering location problem: Revisited,” Transportation Sci., vol. 23, no. 4, pp. 277–287, 1989. M. S. Maxwell, S. G. Henderson, and H. Topaloglu, “Ambulance redeployment: An approximate dynamic programming approach,” Winter Sim. Conf., pp. 1850–1860, 2009. C. J. Jagtenberg, S. Bhulai, and R. D. van der Mei, “An efficient heuristic for real-time ambulance redeployment,” Op. Res. Health Care, vol. 4, pp. 27–35, 2015. A. Sani, A. Lazaric, and R. Munos, "Risk-aversion in multi-armed bandits," NIPS

26 Thank you for your attention.
Questions?


Download ppt "Adaptive Ambulance Redeployment via Multi-armed Bandits"

Similar presentations


Ads by Google