Internal WP7 meeting Warsaw, June 12-13, 2017 LESSON LEARNT FROM PILOT SURVEYS IRELAND, NETHERLANDS, POLAND, PORTUGAL, UK WITH ROUND TABLE DISCUSSION Internal WP7 meeting, Warsaw, June 12-13, 2017 Internal WP7 meeting Warsaw, June 12-13, 2017
WHAT DO WE EXPECT AFTER THIS SESSION? Additional information on pilots conducted (especially Ireland and Poland). Overview of issues and obstacles related to pilots conducted. List of problems that we have to tackle with when implementing pilots in other countries. Internal WP7 meeting, Warsaw, June 12-13, 2017
AGENDA Agriculture Use Case Tourism Use Case Population Use Case IE – current state and issues, PL - issues Comments Tourism Use Case NL, PL – issues Population Use Case PL - how to prepare a good training dataset, UK experience Comments Round table Internal WP7 meeting, Warsaw, June 12-13, 2017
AGRICULTURE – JOHN IE PRESENTATION COMMENTS Internal WP7 meeting, Warsaw, June 12-13, 2017
TOURISM – ISSUES Data sources to collect – agreements New data source – flight movement Sustainability of data sources Archive of flights is expensive (ca. 800$ for full one month archive) We scrap the data by robot – still respecting robots.txt COMMENTS? Internal WP7 meeting, Warsaw, June 12-13, 2017
POPULATION/PL LIFE SATISFACTION. HOW IT WORKS? (3) (2) Twitter data Tweepy Sklearn Training Dataset Machine Learning algorithm Data extracting Predictive model Labels Feature vectors Result set (1) Internal WP7 meeting, Warsaw, June 12-13, 2017
POPULATION – DATA COLLECTING TIMELINE WITH TWEEPY (ABOUT 20 THOUS. TWEETS / HOUR IN POLISH) Structure of training dataset is critical – it may lead to the wrong conclusions if disproportion in different attributes We have to maintain and modify the training dataset all the time Internal WP7 meeting, Warsaw, June 12-13, 2017
POPULATION - ISSUES Representativeness – Twitter popularity in your country (e.g., Poland: 20 thous. tweets per hour; worldwide: 400 milion tweets a day in 2013) Daily life satisfaction (value added) – how many tweets a day can you collect? Concentrate only on text, remove usernames; lemmatization, stemming may not work Code page (UTF-8, cp1252 (windows-1250) vs. ISO-8859-2) Precision of ML – 0.49 – 0.80 Retweets Attributes for the structure of population – region/gender? Internal WP7 meeting, Warsaw, June 12-13, 2017
Any other issue to discuss? DISCUSSION Any other issue to discuss? More questions for better understanding the topic? Round table – the most relevant issues when applying the pilot in your country Internal WP7 meeting, Warsaw, June 12-13, 2017