Real-Time Trip Information Service for a Large Taxi Fleet Rajesh Krishna Balan, Nguyen Xuan Khoa, and Lingxiao Jiang MobiSys 2011
Introduction Real-time trip information system that provides passengers with the expected fare and trip duration of the taxi ride they are planning to take. 15000 taxi, 21 month, 250 million data in Singapore Large scale implementation and evaluations
Motivation Unscrupulous driver who take longer routes Passenger can estimate trip time and fares by themselves. Failed solution : Google Maps Latency Trip fare Not accurate 35% time error
Taxi Network Taxi are cheap Taxi are common and found everywhere Most pickups are street pickups Used for all activities
Taxi locations in one day
challenge Large amount data Real time query requirement Various time-related factors How much data is sufficient? How to filter the data?
Service requirements Accuracy Real time capability Fares Real time capability Low computational requirements Easy to deploy operationally
Method design Partition Time location Prediction Hash table KNN
Time partition Hour Days of week(DoW) Hourly DoW Peak period 24*7=168Hr Peak period Week day 7am~10am, 5pm~8pm +35% Week day 6am-7am, 10am~5pm non-peak Weekend 6am~0am non-peak night 0am~6am +50%
location partition Static zone Dynamic zone 25km x 50km 50x50m~500x500m to divide zones Dynamic zone Adjust zone size for each trip
Prediction Input : start time, start GPS, end GPS Static Dynamic Similar historical data and average ( fare, duration, distance Index and hash table Dynamic KNN Data set (start time, S_long, S_latt, E_long, E_latt)
Evaluation Set1: 20 subsets for training 2010/8 2010/7+8 ….. 2009/1~2010/8 Set2 : 1 subset for testing(query) 2010/9
Evaluation LOC: start and end location PEAK: peak hour DoW: days of week HR: 24 hour DoW x HR: 168hr
Fare and duration in Static zone Fare error : 0.87$~2.53$ Duration error: 2min ~4min
Hit rate in static zone Hit rate: % of test trips having a non-empty entry in prediction table Hit rate in static zone is 17%~58%
Fare and duration in dynamic Fare error : 1.05$~1.25$ Duration error: <3min K=25 is the optimal choice
PEAK predictor w/ various K Save the fare 15 cents at most Save the time 15 sec at mosy
Radius of dynamic zone Mean: 375m Std.dev. :741m
Speed and memory Static is efficient than dynamic Dynamic costs lots of memory space static zones dynamic zones
Accuracy analysis Still not very accurate using three basic features Why? Indirect routing Traffic conditions
Accuracy analysis PEAK predictor with 200m zones Same start time, start point ,end point Distance error 6km max Duration error 1000 sec max
Filter design Filter 1: Filter 2: Trip distance > 2 straight distance of Start and End Filter 2: Average speed <20 km/h or >100km/h
Apply filter result Save fare 25 cents Save time 30 sec
Traffic conditions Rainfall is severe Save fare 10 cents Save time 60 sec
Future work Different zone size for various location Zone size determined by radius of dynamic
Conclusion reducing the data size through aggregation and smart filtering is essential. real world data needs to be cleaned before use deploying a research prototype into a real production environment requires far more work than we naively expected
contribution Detailed description of the steps to build such real time taxi system Method of identifying real-time patterns, applicable for other transportation network Principled approach to balance the tradeoffs between accuracy, real time performance KNN method to produce accurate predictor Insight into challenge from prototype to operational environment