Predicting Subway Passenger Flow with Social Media under Event Occurrences Qing He joint work with Ming Ni and Jing Gao First Annual Symposium TransInfo.

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

BURSTY SUBGRAPHS IN SOCIAL NETWORKS. Introduction 2.
Cyber-Security: Some Thoughts
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Time Travel Student Sample. Questions “Where would you go?” “To which year or period of time would you travel?” “Who would you want to meet or what event.
The Current State and Future of the Regional Multi-Modal Travel Demand Forecasting Model.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Brief Overview of New ALCAM
Budget Hearing – March 24, Badger Classic – Madison, WIIowa St. – Ames, IA.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Chapter 11 Solved Problems 1. Exhibit 11.2 Example Linear and Nonlinear Trend Patterns 2.
Yinhai Wang University of Washington and Harbin Institute of Technology For OpenITS Symposium Oct.
Traffic modeling and Prediction ----Linear Models
The Impact of Convergence Criteria on Equilibrium Assignment Yongqiang Wu, Huiwei Shen, and Terry Corkery Florida Department of Transportation 11 th Conference.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Professional Development Activity Log: Comparing Teacher Log and Survey Approaches to Evaluating Professional Development AERA Annual Meeting Montreal,
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
KE22 FINAL YEAR PROJECT PHASE 2 Modeling and Simulation of Milling Forces SIMTech Project Ryan Soon, Henry Woo, Yong Boon April 9, 2011 Confidential –
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
MULTI-SOURCES PRECIPITATION ESTIMATION K. Tesfagiorgis, S. E. Mahani, R. Khanbilvardi (NOAA-CREST, CCNY, CUNY, NY-10031) David Kitzmiller (NOAA-NWS Collaborator)
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
Prediction of Traffic Density for Congestion Analysis under Indian Traffic Conditions Proceedings of the 12th International IEEE Conference on Intelligent.
Online Learning for Collaborative Filtering
Object Detection with Discriminatively Trained Part Based Models
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Traffic Prediction in a Bike-Sharing System
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
Warm Up October 31, 2011 What is the main goal in a football game? What is the main goal in a basketball game? What is the main goal in a baseball game?
Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.
User Modeling and Recommender Systems: evaluation and interfaces Adolfo Ruiz Calleja 18/10/2014.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Unsupervised Streaming Feature Selection in Social Media
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Developing Predictive Border Crossing Delay Models Lei Lin, Ph.D. Qian Wang, Ph.D. Adel W. Sadek, Ph.D. First Annual Transportation Informatics Symposium.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
TIM 270 Service Engineering and Management Lecture 6: Forecasting.
SunGuide SM Incident Management Concepts Robert Heller October 21, 2004.
City of Hampton FY07 Budget Parks and Recreation Department Proposed Reductions April 26, 2006 City Council Presentation.
My Vacation to New York City Mrs. A. When will we go? We will go in June We will leave on Thursday June 2 nd and come home Tuesday June 7 th.
Collaborative Deep Learning for Recommender Systems
Designing a framework For Recommender system Based on Interactive Evolutionary Computation Date : Mar 20 Sat, 2011 Project Number :
Experience Report: System Log Analysis for Anomaly Detection
Hybrid Data Assimilation
Event Detection using Customer Care Calls
DM-Group Meeting Liangzhe Chen, Nov
Collaborative Ranking with Social Relationships for Top-n Recommendations(SCR) Di Fang/df2vv.
Time Travel Student Sample.
Summary Presented by : Aishwarya Deep Shukla
Yun-FuLiu Jing-MingGuo Che-HaoChang
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Identification of Reduced-Oder Dynamic Models of Gas Turbines
Transportation Management Plan Modernization Project
Asymmetric Gradient Boosting with Application to Spam Filtering
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Movie Recommendation System
CRISP: Consensus Regularized Selection based Prediction
Generalization bounds for uniformly stable algorithms
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

Predicting Subway Passenger Flow with Social Media under Event Occurrences Qing He joint work with Ming Ni and Jing Gao First Annual Symposium TransInfo University Transportation Center University at Buffalo, SUNY August 13, 2015

Predicting Passenger Flow at Subway Stations Subway stations are particularly vulnerable to pedestrian congestion because they comprise primarily enclosed areas with limited entrances and exits. Assist rail transit station designers and operators on how to best accommodate and manage their rail passengers. 2

Research Challenges in Predicting Passenger Flow under Event Occurrences Characteristics of non-recurrent events (e.g. sporting game, concert, running race, etc.) Irregularity of event occurrence Inconsistency of event impact Unexpected events and public gatherings 3

The Proposed Framework 4 Social Media Data Transit Turnstile Data Geo Filter Bounding Box Feature Generation Twitter Rates Transit Passenger Flow Event Detection Baseball Game Music Concert US Open Tennis A set of events with high social media activity Passenger Flow Prediction Modeling Parametric methods Nonparametric meds x f(x)

Overview of Dataset Study period: 7 months, April 2014 to October 2014 Location: Subway station “Mets – Willets Point” on Line 7 in New York City, nearby stadiums Citi Field (home stadium of NY Mets baseball game) USTA Billie Jean King National Tennis Center (NTC) (US Open grand-slam tennis tournament) Social Media data: Geo-tagged Tweets (stream API with bounding box) Passenger flow data: Turnstile devices for both entrance and exit flow at the station, reporting every 4 hours. 5

Sample Tweets - Two Hours before the Events 6 EventSample Twitter Message Type Start Time DetailsCreate atText content Baseball game :10 Mets vs. Yankee :22:22 Checked in CITI field for the yankees vs mets game w yankees mets Tennis games :00 US Open 1 st round :49:46 I’m at 2014 usopen tennis championships in flushing ny Baseball game + Tennis games :00 (T) 19:10 (B) US Open 2 nd round & Mets vs. Braves :29:10 love this place billy jean king national tennis centre us open

Sample Tweets - Two Hours before the Events 7 (a) Baseball game (b) Tennis games (c) Baseball game + Tennis games

Passenger Flow VS. Number of Tweets over Time 8 Passenger Flow Number of tweets

The Hashtag-based Event Identification Algorithm 9

Identified Event Days 81 out of 200 days are identified as game days from April 2014 to October 2014, precision 98.27% and recall 87.69% for baseball games. 10 DateHour Number. of Hashtags Top Hashtags 3/31/1417:00 to 21:0065metsopeningdayny 4/5/1413:00 to 17:00306metsredsbaseball 4/9/1417:00 to 21:0034amalunacirquedusoleilcitifield 5/14/1417:00 to 21:00710metsyankeessubwayseries 5/31/149:00 to 13:0085happiest5kqueensny 6/7/1417:00 to 21:0075digifestnycnycselfie 8/25/1417:00 to 21:00437usopentennisusopen2014 8/31/1413:00 to 17:00609usopenmetstennis

Passenger Flow VS. Tweet Rate under Event Occurrences 11 Strong linear relationship is observed!

Time-of-day Passenger Flow VS. Tweet Rate 12 (a) Nonevent (b) Event

A Convex Optimization based Prediction Model The model structure: Optimization and Prediction with hybrid Loss function (OPL) 1st component: Least square errors for training dataset 2nd component: Least square errors for test dataset 3rd component: Least square errors for OPL and SARIMA, to fuse the results from time series modeling Equation (1) is solved with gradient descent method. 13

Performance Comparisons 14 Mean Absolute Percentage Error (MAPE)

An Ensemble Model to Combine OPL and SVR 15

Conclusions Social media data is able to signify public gathering events. Tweet hashtags can be very useful for event identification. Tweet rates (# of tweets, # of users) can greatly assist in predicting passenger flow under event occurrences. The study fills the gap between day-to-day passenger flow prediction and abruptly changing non-recurrent event volume prediction. 16

Thank you! 17