Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eurostat WebDataNet Conference 2015 Salamanca, 26 th – 28 th May 2015 Fernando Reis, Big Data Task-Force European Commission (Eurostat) Web activity evidence.

Similar presentations


Presentation on theme: "Eurostat WebDataNet Conference 2015 Salamanca, 26 th – 28 th May 2015 Fernando Reis, Big Data Task-Force European Commission (Eurostat) Web activity evidence."— Presentation transcript:

1 Eurostat WebDataNet Conference 2015 Salamanca, 26 th – 28 th May 2015 Fernando Reis, Big Data Task-Force European Commission (Eurostat) Web activity evidence to increase timeliness of official statistics

2 Eurostat Official statistics Census-taking Relief (“Altar of Domitius Ahenobarbus”), Rome, Italy, ca. 100 B.C.E.,

3 Eurostat '….To provide an indispensable element in the information system of a democratic society, serving the government, the economy and the public with data about the economic, demographic, social and environmental situation….' [Fundamental Principles of Official Statistics; principle 1 on Relevance, impartiality and equal access] What is the role of official statistics today?

4 Eurostat My definition of big data Data deluge Larger, faster, more (a.k.a. Volume, Velocity, Variety) Everything is data Text, sound, images, video Analytics Predictive analytics Ex: Google translate, voice recognition, suggestions systems, health applications The new data product by excellence Official stat: chances of getting a new job An emergent market

5 Eurostat Past experiences 2005: Association between web activity and unemployment identified 2006: Google Trends 2008: Google Flu Trends (GFT) 2009: GFT underestimated official figures 1 st revision of GFT model 2013: GFT overestimated flu peak values 2 nd revision of GFT model 2014: Backlash against big data

6 Eurostat Data Source: Google Trends (www.google.com/trends).

7 Eurostat Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates, June 2003–March 2013 Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10) License: Creative Commons CC0 public domain dedication

8 Eurostat Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates, June 2003–March 2013 Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10) License: Creative Commons CC0 public domain dedication

9 Eurostat Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates, June 2003–March 2013 Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10) License: Creative Commons CC0 public domain dedication

10 Eurostat Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates, June 2003–March 2013 Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10) License: Creative Commons CC0 public domain dedication

11 Eurostat Source: Financial Times Magazine (2014).

12 Eurostat Lessons from GFT Premature release of statistical product can harm its reputation Avoid big data hubris Google search algorithms frequent changes impacts validity of models We need transparency and replicability GFT search terms unknown GT is based on a sample which sampling methodology is unknown

13 Eurostat Other sources of web activity Wikipedia page views Flu Twitter International and internal migration flows Possibly other Visits to particular websites

14 Eurostat How to introduce web activity data in official flash estimates? Launch a larger scale balanced study Negative results normally are not published Purpose: guide decision on investment

15 Eurostat How to introduce web activity data in official flash estimates? Diversification and assessment of the web activity data sources NSI lack control of the source Black box Inability to guarantee that there was no manipulation Breaks in series Lack of continuity Diversify the sources Revision of prediction models Accreditation and certification

16 Eurostat How to introduce web activity data in official flash estimates? Integration of web activity data with traditional official statistics sources Official statistics should not simply reproduce what others can do, but instead do it making use of its specific comparative advantages We are the original producers, we know its details Use more detail than what is published Traditional methods (surveys)

17 Eurostat How to introduce web activity data in official flash estimates? Research on relation between web activity and the phenomena being predicted Remember lesson from GFT Do not confuse web activity with the phenomenon itself

18 Eurostat How to introduce web activity data in official flash estimates? Joint effort on the development of appropriate prediction models Learn from each other Transparency International comparability

19 Eurostat Thank you for your attention Fernando Reis Eurostat Task Force on Big Data https://github.com/reisfe/ https://twitter.com/reisfe/ https://linkedin.com/in/reisfe/ fernando.reis@ec.europa.eu


Download ppt "Eurostat WebDataNet Conference 2015 Salamanca, 26 th – 28 th May 2015 Fernando Reis, Big Data Task-Force European Commission (Eurostat) Web activity evidence."

Similar presentations


Ads by Google