Traffic Prediction in a Bike-Sharing System

Slides:

Advertisements

Similar presentations

When Urban Air Quality Meets Big Data

Advertisements

An Interactive-Voting Based Map Matching Algorithm

Indoor Air Quality Monitoring System for Smart Buildings

Urban Computing with Taxicabs

Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.

Exploring Latent Features for Memory- Based QoS Prediction in Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu

Learning Location Correlation From GPS Trajectories Yu Zheng Microsoft Research Asia March 16, 2010.

Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

T-Drive : Driving Directions Based on Taxi Trajectories Microsoft Research Asia University of North Texas Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie,

1/24 Passive Interference Measurement in Wireless Sensor Networks Shucheng Liu 1,2, Guoliang Xing 3, Hongwei Zhang 4, Jianping Wang 2, Jun Huang 3, Mo.

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Learning Transportation Mode from Raw GPS Data for Geographic Applications on the Web Yu Zheng, Like Liu, Xing Xie Microsoft Research.

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang and Jimeng Sun Institute of Automation Chinese Academy of Sciences, IBM Research.

Towards An Open Data Set for Trace-Oriented Monitoring Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Michael R. Lyu 1,2 1 National University.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.

Harikishan Perugu, Ph.D. Heng Wei, Ph.D. PE

Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.

Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

WINLAB Improving RF-Based Device-Free Passive Localization In Cluttered Indoor Environments Through Probabilistic Classification Methods Rutgers University.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Elastic Pathing: Your Speed Is Enough to Track You Presented by Ali.

Siyuan Liu *#, Yunhuai Liu *, Lionel M. Ni *# +, Jianping Fan #, Minglu Li + * Hong Kong University of Science and Technology # Shenzhen Institutes of.

A Study of Smartphone User Privacy from the Advertiser's Perspective Yan Wang 1, Yingying Chen 1, Fan Ye 2, Jie Yang 3, Hongbo Liu 4 1 Department of Electrical.

“Friends/Partners in Aviation Weather” Forum 1 November 2012 Michael D. McPartland Wind Data Quality Factors & Metrics* *This work was sponsored by the.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu Yu Kang, Yangfan Zhou, Zibin Zheng, and Michael R. Lyu {ykang,yfzhou,

Yu Zheng Microsoft Research, Beijing, China

Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.

Forecasting Fine-Grained Air Quality Based on Big Data Date: 2015/10/15 Author: Yu Zheng, Xiuwen Yi, Ming Li1, Ruiyuan Li1, Zhangqing Shan, Eric Chang,

DM-MEETING Bijaya Adhikari OUTLINE From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics 

Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms

Week Aug-24 – Aug-29 Introduction to Spatial Computing CSE 5ISC Some slides adapted from the book Computing with Spatial Trajectories, Yu Zheng and Xiaofang.

A Clustering-based QoS Prediction Approach for Web Service Recommendation Shenzhen, China April 12, 2012 Jieming Zhu, Yu Kang, Zibin Zheng and Michael.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,

Matrix Models for Population Management & Conservation March 2014 Lecture 10 Uncertainty, Process Variance, and Retrospective Perturbation Analysis.

Shaoxu Song 1, Aoqian Zhang 1, Lei Chen 2, Jianmin Wang 1 1 Tsinghua University, China 2Hong Kong University of Science & Technology, China 1/19 VLDB 2015.

Location Choice and Expected Catch: Determining Causal Structures in Fisherman Travel Behavior Michael Robinson Department of Geography University of California,

Application of Probability Density Function - Optimal Interpolation in Hourly Gauge-Satellite Merged Precipitation Analysis over China Yan Shen, Yang Pan,

TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.

of Temperature in the San Francisco Bay Area

T-Share: A Large-Scale Dynamic Taxi Ridesharing Service

Urban Sensing Based on Human Mobility

Author: Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Transportation Impacts of WMATA's SafeTrack Program

Supervised Time Series Pattern Discovery through Local Importance

The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms Yongxin Tong1, Yuqiang Chen2, Zimu.

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data

of Temperature in the San Francisco Bay Area

Differential Privacy in Practice

Mining Frequent Itemsets over Uncertain Databases

Random Sampling over Joins Revisited

Pinjia He, Jieming Zhu, Jianlong Xu, and

Pramod Bhatotia, Ruichuan Chen, Myungjin Lee

Additional notes on random variables

Additional notes on random variables

Biased Random Walk based Social Regularization for Word Embeddings

Exploring Latent Features for Memory-Based QoS Prediction in Cloud Computing Yilei Zhang 17/05/2011.

Jia-Bin Huang Virginia Tech

Presentation transcript:

Traffic Prediction in a Bike-Sharing System Yexin Li, Yu Zheng, Huichu Zhang, Lei Chen The Hong Kong University of Science and Technology Microsoft Research, Beijing, China

Bike-sharing systems are widely available Current Problem Spatial distribution Skewed distributions of Bike Usage Temporal distribution Check out a bike Ride to destination Check in the bike Origin station Check out a bike No bikes No docks Ride Destination station Check in the bike

An Idea Solution Predict bike usages at each station Reallocate bikes by trucks Bike usage is chaotic at an individual station ! 1st 4th 7th 10th 13th 16th 19th 22th 25th 28th 31th S1 S1 S2 S2 8am 9am 10am 11am

A Practical Solution Our solution Observations Cluster stations into groups Predict bike usage of each station cluster Reallocate bike between station clusters day hour Transition Var. Check-out 7-8am C1 Observations Bike usage of a cluster is more predictable. Inter-cluster transition is more stable. Prediction for each station is unnecessary Users check out/in bikes at a random station Events affect an area instead of a station 8am 9am 10am

Challenges Cluster definition Impacted by multiple factors Features considered when clustering Larger check-out at A Larger check-in at B A B Correlation between clusters Impacted by multiple factors Meteorology Correlation between clusters Events Data imbalance # Sunny hours >> # Rainy hours (11.7, 4.6 mph) never happened in NYC, during 01/4-31/9, 2014 Weather distribution Temperature & Wind Speed sample

Framework of Our Solution Bipartite station clustering Check-out Predict bike usage of the entire city … … 0.2 0.1 Hierarchical Prediction Predict check-out proportion Check-in Check-out Probability & Expectation Transition matrix Trip duration Check-in Learning Check-in Inference

Motivation of Bipartite Station Clustering Stations in one cluster should be closed to each other. Stations in one cluster should perform similarly. Inter-cluster transition is more stable. Check-out proportion is more stable. C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 Less stable More stable

Bipartite Station Clustering Procedure Geo-clustering, i.e., K1 Clusters T-matrix generation T-clustering, i.e., K2 Clusters … … T-matrix Generation

Motivation of Hierarchical Prediction Bike usage in the entire city is more regular can be predicted more accurately. Bound the total prediction error in the lower level Entire Traffic day Predict bike usage of the entire city Predict check-out proportion … … 0.2 0.1 Hierarchical Prediction Check-out of a cluster day

Bike Usage of the Entire City Solution  Gradient Boosting Regression Tree, i.e., GBRT Features Extraction Day Hour Weather Temperature Wind speed 13th , Aug. Rainy Temperature keeps increasing 25th , Sep. Windy

Check-out Proportion Prediction 𝑃 𝑡−𝐻 𝑃 𝑡−𝐻+2 𝑃 𝑡−𝐻+1 … … 𝑃 𝑡−1 𝑃 t Weather W(𝑓𝑖 , 𝑓𝑡 ) = 𝜆1(𝑖, 𝑡) × 𝜆2(𝑤𝑖 , 𝑤𝑡) × 𝐾((𝑝𝑖 , 𝑣𝑖 ), (𝑝𝑡, 𝑣𝑡 )) foggy λ 1 𝑡 1 , 𝑡 2 = 1 𝑡 1 , 𝑡 2 × 𝜌 1 ∆ℎ( 𝑡 1 , 𝑡 2 ) × 𝜌 2 ∆𝑑( 𝑡 1 , 𝑡 2 ) Time 1 foggy 𝐾( 𝑝 𝑡 1 , 𝑣 𝑡 1 ,( 𝑝 𝑡 2 , 𝑣 𝑡 2 ))= 1 2𝜋 𝜎 1 𝜎 2 𝑒 −( ( 𝑝 𝑡 1 − 𝑝 𝑡 2 ) 2 𝜎 1 2 + ( 𝑣 𝑡 1 − 𝑣 𝑡 2 ) 2 𝜎 2 2 ) Temperature & Wind speed

Transition Matrix & Trip Duration Inter-cluster transition 𝑻 𝒕,𝒊𝒋 C1 C2 C3 C4 0.1 0.39 0.5 0.65 0.15 0.6 0.29 0.88 0.05 0.01 0.02 Transition Probability. The probability that a bike will be checked in to cluster 𝐶𝑗 given it is checked out from 𝐶𝑖 in time 𝑡. Trip duration 𝑫 𝒊𝒋 Using a log-normal distribution to fit

Check-in Inference Check-out Check-in Expectation of on-road bikes to each cluster 𝑂 𝐶 i ,𝑡 = 𝐸 𝑡 × 𝑃 𝑡,𝑖 Check-in C1 C2 C3 C4 t+𝛿 < t+𝛿 0.4 0.2 0.3 0.1 2 2 2 0.1 0.5 0.3 C1 C2 C4 C3 Bikes will be borrowed Bikes on road

Experiments Datasets Metric Citi-Bike Data in New York City Meteorology Data in New York City Capital Bikeshare in Washington D.C. Meteorology Data in Washington D.C. Metric Error Rate Data Released: http://research.microsoft.com/apps/pubs/?id=255961

Experiments Accuracy improvement >0.03 for all hours Clustering Results Check-out All Hours Anomalous Hours Methods GC BC HA 0.353 0.355 1.964 1.968 ARMA 0.346 2.276 2.273 GBRT 0.311 0.314 0.696 0.683 HP-KNN 0.298 0.299 0.692 0.685 HP-MSI 0.288 0.282 0.637 0.503 Check-in All Hours Anomalous Hours Methods GC BC HA 0.347 0.352 1.837 1.835 ARMA 0.340 0.344 2.152 2.143 GBRT 0.309 0.681 0.671 HP-KNN 0.302 0.295 0.694 0.684 HP-MSI 0.297 0.290 0.642 0.506 P-TD 0.335 0.498 0.445 Accuracy improvement >0.03 for all hours >0.18 for anomalous hours

Conclusions Bipartite station clustering Cluster stations based on locations and transitions Hierarchical prediction improves the accuracy Bound the total error in the lower level >0.03 improvement for all hours Multi-similarity-based model Deal with data imbalance >0.18 improvement for anomalous hours

Thanks ! Contact: Dr. Yu Zheng yuzheng@Microsoft.com Released Data: http://research.microsoft.com/apps/pubs/?id=255961