CS 315 – Web Search and Data Mining. Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting.

Slides:



Advertisements
Similar presentations
Predicting the Present
Advertisements

Spreadsheet Modeling & Decision Analysis
Decomposition Method.
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series Brendan O’Connor Ramnath Balasubramanyan Bryan R. Routledge Noah A. Smith Carnegie.
The Crystal Ball Forecasting Elections in the United States.
Just Google It: Can Internet Search Terms Help Explain Movements in Retail Sales? Daniel Ayoubkhani (ONS) & Matthew Swannell (ONS)
Panagiotis T. Metaxas CS, Wellesley College & CRCS, Harvard University Joint work with Eni Mustafaraj and Dani Gayo-Avello.
Forecasting Demand for Services
Barometric models These models identify patterns among variables over time. That is, you try to find time series variables that “signal” future changes.
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Operations Management R. Dan Reid & Nada R. Sanders
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using.
MANAGERIAL ECONOMICS 12th Edition
Time Series and Forecasting
Fall, 2012 EMBA 512 Demand Forecasting Boise State University 1 Demand Forecasting.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Coletto, Lucchese, Orlando, Perego ELECTORAL PREDICTIONS WITH TWITTER: A MACHINE-LEARNING APPROACH M. Coletto 1,3, C. Lucchese 1, S. Orlando 2, and R.
© 2003 Prentice-Hall, Inc.Chap 12-1 Business Statistics: A First Course (3 rd Edition) Chapter 12 Time-Series Forecasting.
Douglas Economic Outlook GROSS DOMESTIC PRODUCT Seasonally Adjusted Annual Rate.
THE ECONOMY AND THE VOTERS: 2010 Kathleen A. Frankovic Hawaii Economic Association August 26, 2010.
{ Consumer Price Index Evan Creedon, Catherine Cropp, Luke Forker, Ned Moore, and Eric Hostvedt.
© 2002 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 13 Time Series Analysis.
Time Series “The Art of Forecasting”. What Is Forecasting? Process of predicting a future event Underlying basis of all business decisions –Production.
The Forecast Process Dr. Mohammed Alahmed
1 © 2009 The Conference Board, Inc. Job information is entered here! Trusted Insights for Business Worldwide
Nowcasting RDU with trends Based on Durham Paper By Ramy Khorshed.
Chapter 5 Demand Forecasting.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Forecasting supply chain requirements
Cook’s Tour of American Politics and Economics Published February 13, 2013 Updated August 26, 2015 National Journal Presentation Credits Contributors:
Predicting the Future With Social Media. Introduction Goal – How buzz and attention is created for different movies and how that changes over time.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 27 Time Series.
New Homes Sales (Measure of Housing Activity) Web: Monthly revisions which can cover the preceding three months, so look at.
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Time Series Analysis and Forecasting
Chapter 5 Demand Forecasting 1. 1.Importance of Forecasting  Helps planning for long-term growth  Helps in gauging the economic activity (auto sales,
© 2000 Prentice-Hall, Inc. Chap The Least Squares Linear Trend Model Year Coded X Sales
Demand Management and Forecasting Module IV. Two Approaches in Demand Management Active approach to influence demand Passive approach to respond to changing.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
© 1999 Prentice-Hall, Inc. Chap Chapter Topics Component Factors of the Time-Series Model Smoothing of Data Series  Moving Averages  Exponential.
1 Chapter 5 Demand Forecasting. 2 1.Importance of Forecasting  Helps planning for long-term growth  Helps in gauging the economic activity (auto sales,
Welcome to MM305 Unit 5 Seminar Prof Greg Forecasting.
Demand Forecasting Prof. Ravikesh Srivastava Lecture-11.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
DEPARTMENT OF MECHANICAL ENGINEERING VII-SEMESTER PRODUCTION TECHNOLOGY-II 1 CHAPTER NO.4 FORECASTING.
Forecasting Demand for Services. Learning Objectives l Recommend the appropriate forecasting model for a given situation. l Conduct a Delphi forecasting.
Forecasting is the art and science of predicting future events.
Charlie Cook’s Tour of American Politics and Economics February 23, 2016 First Published: February 13, 2013 Producer: Alexander Perry With Contributions.
Forecasting Demand. Problems with Forecasts Forecasts are Usually Wrong. Every Forecast Should Include an Estimate of Error. Forecasts are More Accurate.
Managerial Decision Modeling 6 th edition Cliff T. Ragsdale.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 7: Time Series Analysis and Forecasting 1 Priyantha.
Forecas ting Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
Forecast 2 Linear trend Forecast error Seasonal demand.
Economics 173 Business Statistics Lecture 28 © Fall 2001, Professor J. Petry
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Forecasting Quantitative Methods. READ FIRST Outline Define Forecasting The Three Time Frames of Forecasting Forms of Forecast Movement Forecasting Approaches.
Time Series Forecasting Trends and Seasons and Time Series Models PBS Chapters 13.1 and 13.2 © 2009 W.H. Freeman and Company.
Welcome to MM305 Unit 5 Seminar Dr. Bob Forecasting.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
BUS 308 Week 4 Problem Set Check this A+ tutorial guideline at Problem Set Week Four.
Online Conditional Outlier Detection in Nonstationary Time Series
Forecasting with a Trend
Demand Management and Forecasting
Statistics for Managers using Microsoft Excel 3rd Edition
FORCASTING AND DEMAND PLANNING
Chapter 8 Supplement Forecasting.
Presentation transcript:

CS 315 – Web Search and Data Mining

Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting movie success! Predicting elections! Predicting elections? What can (and cannot) be predicted How (not to) predict

Tracking Seasonal Flu through the CDC.gov Map taken on April 18 -> Based on reports from Hospitals Takes a couple of weeks to record

google.org/fl utrends/us Map taken on April 18 -> Based on keywords being searched It is updated immediately Data can be downloaded, studied

Why does it work so well? “close relationship between how many people search for flu-related topics and how many people actually have flu symptoms”

Google Trends predicts flu outbreak!

Observing the crowd It makes sense: People search about things they want to be informed about, including flu symptoms Another example: Which day of the week there are the most queries with the term “hangover” in?

Observing the crowd It makes sense: People search about things they want to be informed about, including flu symptoms Another example: Which day of the week there are the most queries with the term “hangover” in? “Civil war” what do you expect to see?

Predicting “the future" Sample data Not identical when repeated Preserve privacy Normalized data Peak at 100% You can disambiguate Apple in computer & electronics Apple in food & drink Downloadable Must be logged in Geography Category Time window

10 Autoregressive: value at time t depends on Value at time t-1 Seasonal adjustment: value at time t depends on Value at time t-12 Transfer function: value at time t depends on other contemporaneous or lagging variables Seasonal autoregressive transfer model: Value at time t depends on Value at time t-12 (seasonality)‏ Value at time t-1 (recent behavior)‏ Other lagging or contemporaneous variables (such as Google Trends data)‏ Typical question of interest How much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself? Basic Econometrics Forecasting Models

Method: Fit other data as best you can, then add Trends data, improve prediction Model: Y t = * Y t - 1 – * us * us96.2 – * AvgP t – 1 Y t : New house sold at t-th month AvgP t – 1 : Average Sales Price of New One-Family Houses Sold at (t-1)-th month us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1 st week us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2 nd week Analysis and Forecasting July 2008 Actual = 515K Predicted = K Z-score = 2.53 August 2008 Prediction = K

Google Trends “can predict the present”

Predicted with Google Trends Home sales Movie box-office success Product sales (e.g., video games) Travel to Hong Kong Unemployment rates …Consumer behavior, in general? (Goel paper) Is there anything that could NOT be predicted with Google Trends? Is Twitter chat volume as good?

Twitter Predicts Movie Box-Office Sales!

Movie buzz creates tweets… The rate at which movie tweets are generated can be used to build a powerful model for predicting movie box-office revenue, (better than “gold-standard” Hollywood Stock Exch.) Tweet-rate(movie) = tweets(movie)/hour Predictions (linear regression): 7-days before release data thent: #theaters playing HSX index

Twitter monitors Poll Sentiment (!) For more information, see “oconnor – tweets to polls AAPOR panel.ppt”

Smoothed (15 days) comparisons SentimentRatio(”jobs”)

US Presidential elections not predicted 2008 elections SR(“obama”) and SR(“mccain”) sentiment do not correlate But, “obama” and “mccain” volume: r =.79,.74 (!) Simple indicator of election news? 2009 job approval SR(“obama”): r =.72 Looks easier: simple decline

In the meantime, in Germany…

Twitter can Predict Elections (?!) For more info, see “icwsm2010_Tumasjan-Predicting elections with Twitter.pdf”

Not so fast, speedy… It seems that they forgot the party with the biggest tweet share…

Maybe Google Trends can predict US Elections…

Can Google Trends predict elections? 2008 US Congressional Elections Data Collection 2010 US Congressional Elections Data Collection The Competitors for Prediction:

US congressional elections 2008 & Total Races House Races Senate Races3233 Highly contested61125 Democrats Republicans “landslide win”DemocratsRepublicans

Prediction of All races (unfair to Google-trends)

Prediction of races where one candidate had no G-trends visibility

Prediction of races where both candidates had G-trends visibility

What about the one success case?

Conclusions Google Trends: bad predictor of election results Google Trends: Good Predictor of election defeat! But what about other Social Media? What do YOU think?

High G-trends may be bad news! Liberal activists openly collaborate to Google-bomb search results of political opponents in 2006 Conservative activists launch a Tweeter-bomb in Jan Liberal activists try again unsuccessfully in 2010