Predicting Subway Passenger Flow with Social Media under Event Occurrences Qing He joint work with Ming Ni and Jing Gao First Annual Symposium TransInfo University Transportation Center University at Buffalo, SUNY August 13, 2015
Predicting Passenger Flow at Subway Stations Subway stations are particularly vulnerable to pedestrian congestion because they comprise primarily enclosed areas with limited entrances and exits. Assist rail transit station designers and operators on how to best accommodate and manage their rail passengers. 2
Research Challenges in Predicting Passenger Flow under Event Occurrences Characteristics of non-recurrent events (e.g. sporting game, concert, running race, etc.) Irregularity of event occurrence Inconsistency of event impact Unexpected events and public gatherings 3
The Proposed Framework 4 Social Media Data Transit Turnstile Data Geo Filter Bounding Box Feature Generation Twitter Rates Transit Passenger Flow Event Detection Baseball Game Music Concert US Open Tennis A set of events with high social media activity Passenger Flow Prediction Modeling Parametric methods Nonparametric meds x f(x)
Overview of Dataset Study period: 7 months, April 2014 to October 2014 Location: Subway station “Mets – Willets Point” on Line 7 in New York City, nearby stadiums Citi Field (home stadium of NY Mets baseball game) USTA Billie Jean King National Tennis Center (NTC) (US Open grand-slam tennis tournament) Social Media data: Geo-tagged Tweets (stream API with bounding box) Passenger flow data: Turnstile devices for both entrance and exit flow at the station, reporting every 4 hours. 5
Sample Tweets - Two Hours before the Events 6 EventSample Twitter Message Type Start Time DetailsCreate atText content Baseball game :10 Mets vs. Yankee :22:22 Checked in CITI field for the yankees vs mets game w yankees mets Tennis games :00 US Open 1 st round :49:46 I’m at 2014 usopen tennis championships in flushing ny Baseball game + Tennis games :00 (T) 19:10 (B) US Open 2 nd round & Mets vs. Braves :29:10 love this place billy jean king national tennis centre us open
Sample Tweets - Two Hours before the Events 7 (a) Baseball game (b) Tennis games (c) Baseball game + Tennis games
Passenger Flow VS. Number of Tweets over Time 8 Passenger Flow Number of tweets
The Hashtag-based Event Identification Algorithm 9
Identified Event Days 81 out of 200 days are identified as game days from April 2014 to October 2014, precision 98.27% and recall 87.69% for baseball games. 10 DateHour Number. of Hashtags Top Hashtags 3/31/1417:00 to 21:0065metsopeningdayny 4/5/1413:00 to 17:00306metsredsbaseball 4/9/1417:00 to 21:0034amalunacirquedusoleilcitifield 5/14/1417:00 to 21:00710metsyankeessubwayseries 5/31/149:00 to 13:0085happiest5kqueensny 6/7/1417:00 to 21:0075digifestnycnycselfie 8/25/1417:00 to 21:00437usopentennisusopen2014 8/31/1413:00 to 17:00609usopenmetstennis
Passenger Flow VS. Tweet Rate under Event Occurrences 11 Strong linear relationship is observed!
Time-of-day Passenger Flow VS. Tweet Rate 12 (a) Nonevent (b) Event
A Convex Optimization based Prediction Model The model structure: Optimization and Prediction with hybrid Loss function (OPL) 1st component: Least square errors for training dataset 2nd component: Least square errors for test dataset 3rd component: Least square errors for OPL and SARIMA, to fuse the results from time series modeling Equation (1) is solved with gradient descent method. 13
Performance Comparisons 14 Mean Absolute Percentage Error (MAPE)
An Ensemble Model to Combine OPL and SVR 15
Conclusions Social media data is able to signify public gathering events. Tweet hashtags can be very useful for event identification. Tweet rates (# of tweets, # of users) can greatly assist in predicting passenger flow under event occurrences. The study fills the gap between day-to-day passenger flow prediction and abruptly changing non-recurrent event volume prediction. 16
Thank you! 17