The Rich and the Poor: A Markov Decision Process Approach to Optimizing Taxi Driver Revenue Efficiency Huigui Rong, Xun Zhou, Chang Yang, Zubair Shafiq,

Slides:



Advertisements
Similar presentations
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
Advertisements

A Markov Decision Model for Determining Optimal Outpatient Scheduling Jonathan Patrick Telfer School of Management University of Ottawa.
Elevator Scheduling Ingrid Cheh Xuxu Liu 05/05/09.
Yu Stephanie Sun 1, Lei Xie 1, Qi Alfred Chen 2, Sanglu Lu 1, Daoxu Chen 1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Planning under Uncertainty
Critical Analysis Presentation: T-Drive: Driving Directions based on Taxi Trajectories Authors of Paper: Jing Yuan, Yu Zheng, Chengyang Zhang, Weilei Xie,
Analyzing Multi-channel MAC Protocols for Underwater Sensor Networks Presenter: Zhong Zhou.
Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
May 2009 Evaluation of Time-of- Day Fare Changes for Washington State Ferries Prepared for: TRB Transportation Planning Applications Conference.
Dimitrios Konstantas, Evangelos Grigoroudis, Vassilis S. Kouikoglou and Stratos Ioannidis Department of Production Engineering and Management Technical.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Stochastic sleep scheduling (SSS) for large scale wireless sensor networks Yaxiong Zhao Jie Wu Computer and Information Sciences Temple University.
Transit Priority Strategies for Multiple Routes under Headway-based Operations Shandong University, China & University of Maryland at College Park, USA.
Exploiting Temporal Dependency for Opportunistic Forwarding in Urban Vehicular Network [MANET-2] Presented by Cui Kai 2011/5/25 Hongzi Zhu, Sherman Shen,
CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2015.
Choosing Freeways MS&E 220 Project David Andrew Harju Shan Liu Xi Wang Huangxuan Ying.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Review of Statistical Terms Population Sample Parameter Statistic.
Hub Location–Allocation in Intermodal Logistic Networks Hüseyin Utku KIYMAZ.
Markov Decision Process (MDP)
WLTP-DHC Analysis of in-use driving behaviour data, influence of different parameters By Heinz Steven
Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
The analytics of constrained optimal decisions microeco nomics spring 2016 dynamic pricing (III) ………….1uber as a platform ………….7 digital markets session.
Modeling the Optimization of a Toll Booth Plaza By Liam Connell, Erin Hoover and Zach Schutzman Figure 1. Plot of k values and outputs from Erlang’s Formula.
CSIE & NC Chaoyang University of Technology Taichung, Taiwan, ROC
Best in Market Pricing. What is Best in Market Pricing ? An extension of parametric modeling for negotiating lowest pricing for a statement of work consisting.
Satisfaction Games in Graphical Multi-resource Allocation
Project Management Lecture 22
Carnegie Mellon University
Figure 5: Change in Blackjack Posterior Distributions over Time.
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Business Modeling Lecturer: Ing. Martina Hanová, PhD.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
By: Imad Feneir.
Making complex decisions
Analysis of Financial Statements
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
2.7: Simulation.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia
Are US Children In Compliance with Vaccination Recommendations?
Biomedical Data & Markov Decision Process
Reinforcement Learning
Professor Arne Thesen, University of Wisconsin-Madison
Optimal Electricity Supply Bidding by Markov Decision Process
SEP routing protocol in WSN
Hidden Markov Models Part 2: Algorithms
Leach routing protocol in WSN
Cost Behavior and Cost-Volume-Profit Analysis
[Project Name] Project Gating Presentation
Analysis of the WLTP EU in-use database with respect to RDE-like trips, update and summary of previous presentations by H. Steven
Leach routing protocol in WSN
Dr. Unnikrishnan P.C. Professor, EEE
Markov Decision Problems
Analyzing test data using Excel Gerard Seinhorst
[# ] A Game Theoretical Approach to Model Decision Making for Merging Maneuvers at Freeway On-ramps Authors: Kyungwon K. and Hesham A. R. Session.
Chapter 6 Network Flow Models.
Authors: Jinliang Fan and Mostafa H. Ammar
Markov Decision Processes
Dr. Arslan Ornek MATHEMATICAL MODELS
Markov Decision Processes
MULTIFAN-CL implementation of deterministic and stochastic projections
Adaptation of the Simulated Risk Disambiguation Protocol to a Discrete Setting ICAPS Workshop on POMDP, Classification and Regression: Relationships and.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

The Rich and the Poor: A Markov Decision Process Approach to Optimizing Taxi Driver Revenue Efficiency Huigui Rong, Xun Zhou, Chang Yang, Zubair Shafiq, Alex Liu - Presented by Team 3

Improving Taxi Efficiency is important because: Improves Drivers’ income Reduces gas emissions Reduces fuel costs

Drawbacks of current strategies: No consideration of passenger destination for future potential passengers Some papers have considered only the next ‘single’ trip without thinking of the next 2-3 trips that might follow Some papers have not considered the temporal variation of passenger distribution

What this paper intends to do Increase Driver Revenue per Time Unit How does it do so? Markov Decision Process Find the best move for a vacant taxi at a given time-slot to maximize the total revenue in that time-slot

Each state of MDP corresponds to: Current Grid Time Driving Direction For each state, a vacant taxi may choose to travel to a subset of its neighboring grids and cruise through that grid

The Problem of the Paper: To perform data analysis to quantify the ‘success’ of the driver To identify the most important factors that improves drivers’ businesses To formulate the problem as an optimal decision problem

Question 1: How do you define ‘driver success’? Driver Business Time = Operation Time + Seek Time Revenue Efficiency = Money Earned / Driver Business Time Using these, the authors identified the top 10% and bottom 10% drivers

Results of the formulas applied:

Results: 80% of drivers had revenue efficiency between 0.72-0.8

Question 2: What makes Top Drivers Successful? How much money can a driver make by driving for 1 minute, ie find out the driving efficiency How quickly can a driver find the next passenger

Question 3: What is the Optimal Strategy? Given the historical passenger trips and seeking trips for each time slot, the current status of an vacant taxi, the authors aim to find the next movement for the driver. The objective is to maximize the total expected revenue for this taxi in the rest time of the current time slot.

Markov Decision Process! A stochastic decision process with a set of states (S) and a set of possible actions (A) that transfer the states from one to another The catch? The next state of the process must only depend on the current state and the action, but not any previous state or action Each action will transfer the process from the current state to a new state at a probability (P) with a corresponding reward (R).

System States Each state of the vacant taxi is decided by three parameters: Current Location (Grid Cell Number) Current Time Direction from which taxi arrived at the current location

Actions Each vacant taxi has 9 possible actions as shown above which is used as (1-9) by the authors to indicate directions

Actions For each grid cell i, the authors examine all the seeking trips that origin from i and end at each of i’s eight neighbors. For each of i’s neighbors j, they calculate how many percent of these seeking trips are between i and j. If the percentage of such trips between i and j is lower than a minimum threshold then they could conclude that there is no road connecting these grids. In this case, they do not allow the taxi to move from i to j. The threshold is 5%.

State Transition and Objective Function Current State -----> s = (i,t,d) An action ‘a’ taken to move the taxi from ‘i’ to ‘j’ in t_seek_(j) minutes will have 2 possibilities: Taxi successfully finds passenger in grid j. Taxi does not find a passenger in grid j.

Scenario 1 (Finds Passenger) Passenger asks to go to grid k at a probability p(j,k) After reaching k, find t(j,k) which is the total time to travel from j to k The fare received in r(j,k) Finally, new state s’ = (k, t + tseek(j) + tdrive(j, k), 0)

Scenario 2 (Did not find Passenger) Taxi must leave j to a new grid to find passenger Thus, new state will be s0 = (j, t + tseek(j), 4)

Summary To summarize, a vacant taxi in any state s0 = (i, t, d), s0 ∈ S may take one of the possible actions a ∈ Allowed(s0) to cruise to a nearby grid j. With the probability 1 − Pf ind(j) the taxi will transition to state s1 = (j, t + tseek(j), 10 − a) with no reward. With the probability of Pf ind(j) × Pdest(j, k), (k = 1, 2, ..., |L|), the taxi will end up in the next state s2 = (k, t + tseek(j) + tdrive(j, k), 0) and receive a reward of r(j, k). The state transition diagram of the proposed MDP model is shown in Figure 5. Each circle represents a state with the three parameters listed beside it.

Learning Parameters for MDP Learning the passenger pickup probability - Pfind Learning passenger destination probability - Pdest Estimating the reward function - r(i, j). Estimating the driving time tdrive. Seeking time for each grid tseek.

Solving MDP - to find the optimal policy For each time slot → build a MDP with parameters learned from data. For each state in the MDP model, the result is → the best action to take for that state to maximize total expected rewards in the remaining time of the time slot. No fixed number of steps. Instead, there are a few terminal states → dynamic programming Once the system reaches a state with t=60 the system terminates.

Solving MDP - to find the optimal policy

Evaluation Revenue Efficiency - How much more revenue can be generated if a driver uses the proposed strategy? Is the proposed method better than methods in related work?

Revenue Improvement Simulation done and compared with top 10% and bottom 10% drivers For top 10% drivers The average revenue efficiency is improved from 0.828 yuan/min to 0.89 yuan/min, yielding a 7.6% improvement. For bottom 10% drivers The average revenue efficiency is improved from 0.75 yuan/min to 0.82 yuan/min, yielding a 9.8% improvement.

Results Figure show the average revenue efficiency of the top drivers in reality compared to the simulated revenue efficiency using the bottom 10% drivers’ driving efficiency for all the day- time slots. The improvement is between 3.6% and 12%.

Results Figure shows the comparison of revenue efficiency between the bottom drivers and the simulated results using the bottom 10% drivers’ driving efficiency for the day-time slots. The corresponding improvement are between 4.2% and 15.5%.

Results Figure shows the simulated revenue efficiency (red line) is higher than the top 10% drivers’ revenue efficiency for almost all the time slots.

Comparison with baseline method Figure shows the average revenue in the simulated results. This method (red curve) outperformed the baseline method (blue curve) in average revenue efficiency (Yuan per minute) in all the time slots. The maximum improvement is about 8.4%.

Conclusion Investigated how to tackle the best taxi seeking strategy from historical data → to improve taxi drivers’ business efficiency over an extended period of time. Proposed to model the passenger-seeking process as a Markov Decision Process → to get the best move for a vacant taxi at any state. Evaluation results showcased→ Improved revenue efficiency of inexperienced drivers by 15% Outperformed baseline by 8.4%

Future Work To improve MDP model to incorporate the total revenue and driving time covering the taxi shift time, and recommend the best strategy to improve the overall revenue efficiency. Include other attributes such weather condition and traffic conditions to improve the results of our method.