Download presentation
Presentation is loading. Please wait.
Published byJames Norman Modified over 6 years ago
1
Forecasting the Future using Diverse Social Media Sources
Katherine Porterfield, Dustin Arendt, Nathan Hodas and Svitlana Volkova Data Sciences and Analytics, National Security Directorate NOTE: Print this poster file at 100% SCALE to result in a physical print measuring 48” wide x 36” tall. All type size notations shown above are based on the final printed size of the poster. • Contact Digital Duplicating ( , to order poster printing and finishing services for your completed poster design. • Remember to have your poster cleared for public display/distribution through the ERICA Information Release system ( • Sidebar “About PNNL” box is considered optional, and can be removed if space is needed for technical content. MOTIVATION Social media has been used to develop large-scale predictive and forecasting analytics – influenza, events, and elections. Current analytics fail to contrast generalizability across different social media signals, models, tasks and metrics. Goal: extensively evaluate the predictive power of social media signals and contrast performance across: predictive models; forecasting tasks, geographic locations, evaluation metrics. FORECASTING VARIABLES WEATHER, TWITTER AND FLICKR DATA Weather data1: Global Summary of the Day (GSOD) Twitter data2: 171M tweets, 25 U.S. and 6 international locations, 25-mile radius from military bases Flickr data3: geo-tags, timestamps and users tags IMAGE FEATURE EXTRACTOR Flickr data was used to extract weather features based off classifying images into four weather categories: cloudy, clear, snow and storm. ResNet4 (He at al., 2016) + ImageNet5 (Deng et al., 2013) FORECASTING MODELS ML: AdaBoost Regressor DL: LSTM models training window size is 12 days, forecasting up to 7 days Text + Image Ensemble Text + Image Merged ImageOnly: 2,048-dim vectors TextOnly Tweet content + affects: what emotions users express Behavior: how people interact Tweets: what people talk about Tweet style: how people talk WeatherOnly (Y from Y) FORECASTING RESULTS Contrasting model performance across the forecasting tasks and geolocations: #3 Max daily and dew-point temperature easier to predict than wind speed and visibility miles #4 Performance varies across geo locations **Example Task** Weather Forecast Metrics: Pearson, RMSE, MAPE, RMSPE, R2 Evaluate model accuracy, robustness and the ability to forecast upward and downward weather tendency Max Daily Temperature (C) Visibility (Miles) Dew Point Temperature (C) Wind Speed #5 ML models (AdaBoost) outperform DL models (LSTM) when trained on the same data across locations. #6 Locations with more tweets yield better forecasts. #7 Higher variance in weather variable leads to higher error. #8 Social media signals yield comparable results to the upper bound weather models. Contrasting the predictive power of social media signals: #1 Models learned from content and affects significantly outperforms content only, behavior and style models #2 TextOnly model outperformed ImageOnly, ensemble and merged image + text models Model Pearson RMSE R2 Weather .92 3.02 .84 ImageOnly .44 7.57 .01 TextOnly .74 5.16 .54 Merge I + T .68 5.87 .40 Ensample I + T .73 .28 File Name // File Date // PNNL-SA-##### 1GSOD Weather data: 2Volkova et. al., Uncovering the Relationships Between Military Community Health and Affects Expressed in Social Media. EPJ Data Science 2017. 4Resnet: He, Kaiming, et al. "Deep residual learning for image recognition. CVPR ImageNet: Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database. CVPR 2009. 3Flickr data:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.