New data sources (such as Big Data) and Traditional Sources Work Package 2.

Slides:



Advertisements
Similar presentations
Inventory Management & Administration System Tourism suite A powerful computer software application handling carefully your inventory department activity.
Advertisements

Will ‘big data’ transform official statistics?
Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Automatically Annotating and Integrating Spatial Datasets Chieng-Chien Chen, Snehal Thakkar, Crail Knoblock, Cyrus Shahabi Department of Computer Science.
Big Data (and official statistics) Piet Daas and Mark van der Loo* Statistics Netherlands MSIS 2013, April 25, Paris * With contributions of: Edwin de.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Page16/2/2015 Sirlan Usage and usability considerations for SIRLAN solution success.
GIS Overview. What is GIS? GIS is an information system that allows for capture, storage, retrieval, analysis and display of spatial data.
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Recent experience on the Survey of small Manufacturing Establishments, 2008/2009 Central Bureau of Statistics Nepal.
Information Architecture Creating well structured, usable sites.
Market Research. Marketing Intern Project Goal: – By the end of this unit, you will develop an effective marketing research study for a mobile phone company.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
User Profiling by Social Curation By GENG Xue Supervised by Prof. Chua Tat-Seng.
Usage of new data sources at SORS Boro Nikić, Tomaž Špeh, Zvone Klun Statistical Office of the Republic of Slovenia Washington, 29 April - 1 May 2015.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Eurostat 09. Statistical registers and frames 1. Presented by Ágnes Andics, Ildikó Györki HCSO Hungarian Central Statistical Office 2.
Evaluating Online Information Sources Ask yourself the following questions…
What You Need before You Deploy Master Data Management Presented by Malcolm Chisholm Ph.D. Telephone – Fax
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Monitoring public satisfaction through user satisfaction surveys Committee for the Coordination of Statistical Activities Helsinki 6-7 May 2010 Steve.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
User Interfaces 4 BTECH: IT WIKI PAGE:
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
UNESCO INSTITUTE for STATISTICS Adapting the indicators Simon P. Ellis Head of Communication Statistics.
United Nations Statistics Division Work Programme on Economic Census Vladimir Markhonko, Chief Trade Statistics Branch, UNSD Youlia Antonova, Senior Statistician,
5 Marzo 2007 INNOVATIONS IN CENSUS MAPPING AND CENSUS DATA GEOCODING Fabio Crescenzi Istat, Central Directorate on General Censuses Joint UNECE/Eurostat.
United Nations Economic Commission for Europe Statistical Division UNECE Big Data Work Steven Vale UNECE
Master Data versus Reference Data
Modernization of official statistics Eric Hermouet Statistics Division, ESCAP
1 Unstructured Data (UD) What is unstructured data? How is it statistically valuable? Challenges of turning UD into information.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
IoT Meets Big Data Standardization Considerations
REENGINEERING OF SOCIAL STATISTICS STATE OF PLAY AND NEXT STEPS Inna Šteinbuka Director, Social and Information Society Statistics Eurostat.
The combined use of multiple data sources in the population census Fabio Crescenzi, Giuseppe Sindoni National Institute of Statistics Rome, Italy
Eurostat Item 10 Special session dedicated to big data sources with potential for tourism statistics The possible future impact of big data on tourism.
Statistical Business Register Enterprise Groups in Latvia Sarmite Prole Head of Business Register Section Business Statics Department Central Statistical.
Big Data and Official Statistics: Philippine Context Erniel B. Barrios.
Transforming official Statistics
Data Science in Official Statistics: The Big Data Team
Sharing of previous experiences on scraping Istat’s experience
AGENDA Introduction Kind of information smart card contain
UNECE Data Integration Project
GEOGRAPHICAL INFORMATION SYSTEM
Eric Shook Department of Geography Kent State University
Preface to the special issue on context-aware recommender systems
Global Statistical Geospatial Framework – interoperability challenges
Master Data versus Reference Data
CEOP/IGWCO Joint Meeting, Feb.28  March 4, University of Tokyo, Japan
United Nations Development Account 10th Tranche Statistics and Data
WP8 Methodology (SGA2) Piet Daas NL, AT, BG, IT, PT, PL, SL.
African Centre for Statistics
Dissemination Workshop ESSnet Big Data Sofia, February 2017
ESTP programme for 2016 Živilė Aleksonytė-Cormier
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Smart Tourism statistics: improving the range of service offering in Rome Massimo De Cubellis Istat -Italy.
Uses of web scraping for official statistics
New sources for the SBR: first evaluations on the feasibility
Use of Web scraping for Enterprises Characteristics
WP7 – COMBINING BIG DATA - STATISTICAL DOMAINS
Measuring Complexity of Web Pages Using Gate
Distributed Edge Computing
Ethical Implications of using Big Data for Official Statistics
Case Study: HLG Big Data Sandbox
Big Data in Official Statistics: Generalities
Presentation transcript:

New data sources (such as Big Data) and Traditional Sources Work Package 2

Context & scenarios Mobile phone data Integration of data sources that traditionally have not been used by most organizations. Social networks Smart sensors Satellite imagery Web pages Credit cards transactions …can update Consumer Price Indices New or more frequent updates of statistics. Scanner data… Mobile phones Twitter Mobility surveys Tourist surveys

Challenges & issues How to use new sources of data for the purpose of integrating them with traditional statistical datasets, namely administrative sources, surveys and statistical registers. Variety of site structure, which allows to experiment with different kind of techniques for processing Internet-scraped data. Amount of Web sites, which is very huge. Definition of the statistical framework for using the Internet- scraped data, e.g. population, unit characteristics, quality metadata, etc. Entity extraction and recognition from the Internet-scraped data, according to the defined statistical framework. Object matching between Internet-scraped data and traditional statistical datasets. Design of statistical analyses starting from the result of the Object matching activity. Goal Features Issues All the issues will be investigated both theoretically and experimentally on a specific domain, namely Enterprises Web sites.

Activities Definition of statistical framework and boundaries of the work Design and test of the integration between Internet-scraped data and traditional statistical datasets. Identification of web sites Identification of traditional statistical datasets to integrate with them Entity extraction and recognition Object matching activities will be carried out Design and evaluation of statistical outputs obtained from the integration/ combination/fusion of Internet- scraped data and traditional statistical datasets.