Internal WP7 meeting Warsaw, June 12-13, 2017

Slides:



Advertisements
Similar presentations
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Advertisements

Twitter 101 An introduction to Twitter basics and its use to enhance higher education.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
8th meeting of the Task Force on Health Expectancies Session 3 – Improving current HLY estimates The health questions in SILC 2007 and 2008.
Project Analysis Course ( ) Final Project Report Overview Prepared by: Sijali Petro Korojelo (Course Assistant)
An Example of Course Project Face Identification.
NET-SHARE Kick-off meeting Matosinhos, Portugal April 15 h 2008 ICT Policy Support Programme (ICT PSP) WPs and ACTION POINTS.
Microblogs: Information and Social Network Huang Yuxin.
Reflecting on How You Learn Note-taking skills Why do it?
Prediction of Influencers from Word Use Chan Shing Hei.
Roswitha Poll Münster, Germany Global statistics What next?
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Reputation Management System
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
PROGRAMMING 1 – REPORT ACKNOWLEDGEMENT: THE SLIDES ARE PREPARED FROM SLIDES PROVIDED BY NANCY M. AMATO AND JORY DENNY 1.
SEGmented Marketing for ENergy efficient Transport SEGMENT SEGmented Marketing for ENergy efficient Transport IEE/09/ SEGMENT: 19-Apr-10 to 18-Apr-13.
SENnet Thematic Study - Year 1 Leuven 3rd Consortium meeting - October 9-10.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
AM10 AUTOMATED TESTING IN DYNAMICS NAV Luc van Vugt, MVP – fluxxus.nl.
IDENTIFYING GREAT TEACHERS THROUGH THEIR ONLINE PRESENCE Evanthia Faliagka, Maria Rigou, Spiros Sirmakessis.
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
PROFILING USERS BY ESTIMATING COMPOSITE AND MULTI-VALUED ATTRIBUTES FROM BIG DATA SOURCES FOR SOCIAL STATISTICS PURPOSES NTTS 2017, Brussels, March.
Names and Attributes Names are a key programming language feature
M25 Group Open Library Data A British Library Perspective
WP2 Internal Meeting 15:00-15:30 Next Milestones and proposed workplan
Introduction Machine Learning 14/02/2017.
M&E Task Force Meeting
Twitter Augmented Android Malware Detection
Removing Duplicate Job Ads
SOCIAL COMPUTING Homework 3 Presentation
Methodology for the assessment of Member States’ reporting on Programme of Measures (Article 16) MSCG Sarine Barsoumian 7 April /09/2018.
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
WP7 MULTI DOMAINS.
Part C State Performance Plan/Annual Performance Report:
<month year> doc.: IEEE < e> <May 2018>
Point 5. Countries plans on Time Use Surveys
Adult Education Survey
Tutorial for LightSIDE
<month year> doc.: IEEE < e> <May 2018>
The SIRE Database Torbiörn Carlquist 06/12/2002.
CD Status Update Barry Verdegan 11 September 2018.
CS Fall 2016 (Shavlik©), Lecture 2
Cryptocurrencies: A Brief Look & Sentiment Analysis
Official Activity Title
Tutorial for WEKA Heejun Kim June 19, 2018.
Internal WP7 meeting Warsaw, June 12-13, 2017
Eurostat and the Visualisation tools
EUPAN/TUNED social dialogue meeting , Stockholm
Education and Training Statistics Working Group
Agenda Item 2.1 SES 2014: follow-up
WP7 – COMBINING BIG DATA - STATISTICAL DOMAINS
French Presidency of the EU 2008
Intro to Machine Learning
Item 7.1 Implementation of the 2016 Adult Education Survey
Item 8.1 Implementation of the 2016 Adult Education Survey
Passenger Mobility Statistics 11 October 2018
Introducing the GSBPM Steven Vale UNECE
2016 AES – Draft Commission Regulation implementing Regulation (EC) No 452/2008 Agenda item 2.3 DSS Meeting 3-4 April 2014.
Morbidity statistics Item 10 of the agenda
ESSnet SAE project meeting Neuchatel, 7-8 July 2011
MississaugaTalks! Saif Shaikh March 5, 2016 Code and the City
Item 4 Overview of the 2016 AES & 2015 CVTS data collection
Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.
Passenger Mobility Statistics 21 May 2015
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Analyzing Influence of Social Media Through Twitter
Social Science Curriculum Roll Out
Take home exercise.
Presentation transcript:

Internal WP7 meeting Warsaw, June 12-13, 2017 LESSON LEARNT FROM PILOT SURVEYS IRELAND, NETHERLANDS, POLAND, PORTUGAL, UK WITH ROUND TABLE DISCUSSION Internal WP7 meeting, Warsaw, June 12-13, 2017 Internal WP7 meeting Warsaw, June 12-13, 2017

WHAT DO WE EXPECT AFTER THIS SESSION? Additional information on pilots conducted (especially Ireland and Poland). Overview of issues and obstacles related to pilots conducted. List of problems that we have to tackle with when implementing pilots in other countries. Internal WP7 meeting, Warsaw, June 12-13, 2017

AGENDA Agriculture Use Case Tourism Use Case Population Use Case IE – current state and issues, PL - issues Comments Tourism Use Case NL, PL – issues Population Use Case PL - how to prepare a good training dataset, UK experience Comments Round table Internal WP7 meeting, Warsaw, June 12-13, 2017

AGRICULTURE – JOHN IE PRESENTATION COMMENTS Internal WP7 meeting, Warsaw, June 12-13, 2017

TOURISM – ISSUES Data sources to collect – agreements New data source – flight movement Sustainability of data sources Archive of flights is expensive (ca. 800$ for full one month archive) We scrap the data by robot – still respecting robots.txt COMMENTS? Internal WP7 meeting, Warsaw, June 12-13, 2017

POPULATION/PL LIFE SATISFACTION. HOW IT WORKS? (3) (2) Twitter data Tweepy Sklearn Training Dataset Machine Learning algorithm Data extracting Predictive model Labels Feature vectors Result set (1) Internal WP7 meeting, Warsaw, June 12-13, 2017

POPULATION – DATA COLLECTING TIMELINE WITH TWEEPY (ABOUT 20 THOUS. TWEETS / HOUR IN POLISH) Structure of training dataset is critical – it may lead to the wrong conclusions if disproportion in different attributes We have to maintain and modify the training dataset all the time Internal WP7 meeting, Warsaw, June 12-13, 2017

POPULATION - ISSUES Representativeness – Twitter popularity in your country (e.g., Poland: 20 thous. tweets per hour; worldwide: 400 milion tweets a day in 2013) Daily life satisfaction (value added) – how many tweets a day can you collect? Concentrate only on text, remove usernames; lemmatization, stemming may not work Code page (UTF-8, cp1252 (windows-1250) vs. ISO-8859-2) Precision of ML – 0.49 – 0.80 Retweets Attributes for the structure of population – region/gender? Internal WP7 meeting, Warsaw, June 12-13, 2017

Any other issue to discuss? DISCUSSION Any other issue to discuss? More questions for better understanding the topic? Round table – the most relevant issues when applying the pilot in your country Internal WP7 meeting, Warsaw, June 12-13, 2017