What is data? Wietse Dol, LEI-WUR 13 November 2012, 9.40 – 10.25, C435 Forumgebouw.

Slides:



Advertisements
Similar presentations
Database Management 2nd EUKLEMS Consortium Meeting, 9-11 June 2005, Helsinki This project is funded by the European Commission, Research Directorate General.
Advertisements

Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Computational Methods for Management and Economics Carla Gomes Module 3 OR Modeling Approach.
Methodology Conceptual Database Design
IELM 230: Industrial Data Systems Course topics: - Relational Database Design - DB development and optimized usage - DB backed web-applications.
United Nations Statistics Division Backcasting. Overview  Any change in classifications creates a break in time series, since they are suddenly based.
RESEARCH FRAMEWORK Yulia Sofiatin Department of Epidemiology and Biostatistics 2012 YS 2011.
Background Data validation, a critical issue for the E.S.S.
The implementation of the SDMX standards by the ECB and the European System of Central Banks Werner Bier (ECB) Gérard Salou (ECB) Sami Airo (Bank.
Ilie Dumitrescu National Institute of Statistics Third International Conference on Agricultural Statistics MEXSAI-ICAS-III Cancun- Mexico, 2-4 November.
Statistik.atSeite 1 Implementation of NACE Rev. 2 at the national level Norbert Rainer.
Disciplinary boundaries and heterogeneity of sciences Catherine Laurent ( UWC 5-6 november 2007)
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Not our data, but we use it in research Wietse Dol, LEI-WUR 6 October 2014.
International Trade by Commodity Statistics (ITCS) database: Allocating data from HS 6 digits to HS 2 digits WPTGS, November 2009 Blandine Serve,
4 May 2010 Towards a common revision for European statistics By Gian Luigi Mazzi and Rosa Ruggeri Cannata Q2010 European Conference on Quality in Official.
Using Multiple Methods to Reduce Errors in Survey Estimation: The Case of US Farm Numbers Jaki McCarthy, Denise Abreu, Mark Apodaca, and Leslee Lohrenz.
FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS Statistics Division Essential fertilizer data and structure for country and cross-country analysis.
GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Implementation of quality indicators in the Finnish statistics production process Kari Djerf Statistics Finland Q2008, Rome Italy.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
REPUBLIC OF BELARUS EXPERIENCE OF BELARUS IN THE NATIONAL ADAPTATION OF CLASSIFICATIONS OF ECONOMIC ACTIVITIES Olga Pintchuk, Chief of the department of.
1 Calculation of unit value indices at Eurostat Training course on Trade Indices Beirut, December 2009 European Commission, DG Eurostat Unit G3 International.
Eurostat – Unit D5 Key indicators for European policies European Conference on Quality in Official Statistics, Q2010 Helsinki, 4-6 May 2010.
Backcasting United Nations Statistics Division. Overview  Any change in classifications creates a break in time series, since they are suddenly based.
Discussion, Q2010 Cynthia Clark National Agricultural Statistics Service.
LECTURE 1 - SCOPE, OBJECTIVES AND METHODS OF DISCIPLINE "ECONOMETRICS"
Not our data, but we use it in research Wietse Dol, LEI-WUR 9 February 2015, Forum C214.
Eurostat – Unit D5 Key indicators for European policies Third International Seminar on Early Warning and Business Cycle Indicators Annotated outline of.
Eurostat 1 7a. Practical use case 1: Pesticides Use Project Blanaru Cristina Eurostat Unit B5: “Central data and metadata services” SDMX Basics course,
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX and Metadata SDMX Basics Course 12 April 2013 Daniel Suranyi Eurostat B5 Management of statistical data and metadata.
Harmonisation of Seasonal Adjustment Methods in EU and OECD Countries Ronny Nilsson Statistics Directorate.
General Recommendations on STS Carsten Boldsen Hansen Economic Statistics Section, UNECE UNECE Workshop on Short-Term Statistics (STS) and Seasonal Adjustment.
A Proposal for a revisions policy of Principal European Economic Indicators (PEEIs) OECD STES WP June 2008.
Implementation Economic Analysis in Elpen System Petra Jaegersberg & Martin van der Beek.
Syrian Agriculture Database The NAPC with the support of the FAO project GCP/SYR/006/ITA has produced the Syrian Agr. database The NAPC.
Implementation of NACE rev.2 in short –term economic statistics: what did we do in practice? Leendert Hoven Statistics Netherlands presentation prepared.
Combustion & Industry, Transport, Agriculture & Nature Panels Expert Panels Panels are ad hoc. Currently three ‘sector’ panels: –Combustion and Industry.
13-Jul-07 State of the art of the ISCO-08 implementation.
14-Sept-11 The EGR version 2: an improved way of sharing information on multinational enterprise groups.
ESTP Course on the EGR November FATS user interface and metadata of final frame.
WG Environmental Accounts / Environmental Expenditure Statistics Environmentally related taxes Luxembourg, March 2011.
Model based approach for estimating and forecasting crop statistics: Update, consolidation and improvement of AGROMET model “AGROMET Project” Working Group.
Session 6: Data Flow, Data Management, and Data Quality.
7-8-March 2011 Task Force "Organic farming statistics"-Luxembourg, 7-8 March Item 4 Harmonised questionnaire for data collection: State of the art.
Saturday, 11 June 2016 Project FoodCASE Workshop Data Quality Research on Food Composition Database Systems © Department of Computer Science | ETH Zürich.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Montenegrin FADN FAO project Szilárd Keszthelyi, PhD.
Information and Information Technology 1. Information and employment 2.
Quality declarations Study visit from Ukraine 19. March 2015
Not our data, but we use it in research
United Nations Statistics Division DESA, New York
Exchanging Reference Metadata using SDMX
United Nations Statistics Division DESA, New York
Generic Statistical Business Process Model (GSBPM)
ESTP COURSE ON PRODCOM STATISTICS
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Draft EP/Council Regulation for processes, standards and
Ag. No Transparency Consultation
Validation services developed in the ESS
Item 7.3 (b) SDMX for UOE data collection
Point 6.1 of the agenda Advisory group for land cover/use statistics
9. Practical use case 3: Pesticides Use Project
Data compilation and pre-validation
PRODCOM Working Group JMO M November 2012
Presentation transcript:

What is data? Wietse Dol, LEI-WUR 13 November 2012, 9.40 – 10.25, C435 Forumgebouw

LEI: Agricultural Economic Research Institute  Part of Wageningen University & Research center (WUR)  Part of the Social Science Group within the WUR  We are the research part of WUR/SSG (advice ministry of Agriculture) in The Hague  Consultancy (applied research): ministries, EU, local government, industry,…  Collecting data (Farm data: FADN), building models and agricultural content specialists

University vs. Research center  University: teaching, publications, new theory and technology  Research center: ● applied work/consultancy ● reusing things from the past (e.g. yearly publications) ● sharing knowledge (how to become a content specialist)/teaching for small groups ● working in groups (different disciplines) ● Working in (inter)national groups with many different disciplines

Wietse Dol  PhD Econometrics  10 years University of Groningen (Econometrics, sampling theory) 18 years LEI (many different departments)  Data and models, i.e. use/reuse and quality, trouble shooter + statistical methods + ICT + user interfacing  Not and IT guy but a researcher (I build software because I use it myself)  Many model projects and user interfaces for models (not only LEI)  Currently: data, data quality, …

Data, lifecycle and data management

Data is anything and everything Research data: collected, observed, or created, for the purpose of analysis to produce and validate original research results. Anything can become the interest of research … Data Research

Primary v.s. Secondary data  Primary data: you collect, targeted to answer/validate your questions.  Secondary data: not yours. Quality of data Meta-information is crucial More and more need of secondary data (primary is expensive and takes a lot of time to collect).

Production data Meta-information: Source, Version, Dimension, Definitions etc. without proper information you use the wrong data  is FR with or without DOM?  Is the production in tons or in Euros.  Does the year start 1-1 and ends 31-12?  What’s the definition of Tomato ProductCountryYearProduction TomatoNL WheatBE SugarFR

DCC Curation Lifecycle Model

CREATE & MANAGE DATA: RESEARCH DATA LIFECYCLE

Data  How to get the data, filter it and store it  Quality checks on the data  How to make it available for others  What scientific actions are done on the data  Curate, preserve, versions,..

Types of databases according MetaBase  Statistical database  Scientific database  Meta-database

Statistical database  Databases provided by international organizations like EU, FAO and OECD are in general statistical databases: ● Data are stored as they are received ● Data are consistent in their own domain ● No aggregations are made when underlying data are missing ● Not much attention for data checking

Scientific versus Statistical database  Problems with statistical database: ● Different definitions of territories and commodities ● Typing errors ● Missing data ● Break in series  Scientific database: ● Problems solved ● Transparency (original data sources and underlying assumptions are kept) ● Essential for modeling and research

Structural design of a scientific database  Key words for structural design HarDFACTS project IPTS 2007 done by vTI/LEI ● Transparent ● Harmonised ● Complete ● Consistent Harmonised Database for Agricultural Commodity Time Series

Transparent  Original data from statistical database are stored  Complete and consistent data are stored  Original and completed data can be compared  Calculation procedures are stored and can be repeated

Harmonised  Definition used here is to bring together the different international databases in one framework and to link the data through a unique coding system (keywords are classifications and tree structures)

Complete  Definition used in MetaBase is that an econometric procedures will be proposed to complete the new (time) series in the database. ● Trend estimates ● Interpolation ● Correlation and regression with other variables (e.g. TRAMO: Time series Regression with Arima noise, Missing observations and Outliers)

Consistent  Definition used here is that the inter relationship of the data in the database holds over classifications (time, territories and variables).

MetaBase

1. many different data sources (e.g. FAO, Eurostat) all in same user-interface (SDMX, NetCDF) 2. find data alternatives using Meta-Information 3. search data content (e.g. oilseed) 4. all content easily available in research software (R/GAMS) 5. recodings, aggregations and concordances are all implemented in GAMS 6. Statistical methods in GAMS and R

Thank you for your attention! Or send an