Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is data? Wietse Dol, LEI-WUR 13 November 2012, 9.40 – 10.25, C435 Forumgebouw.

Similar presentations


Presentation on theme: "What is data? Wietse Dol, LEI-WUR 13 November 2012, 9.40 – 10.25, C435 Forumgebouw."— Presentation transcript:

1 What is data? Wietse Dol, LEI-WUR (W.Dol@wur.nl) 13 November 2012, 9.40 – 10.25, C435 Forumgebouw

2 LEI: Agricultural Economic Research Institute  Part of Wageningen University & Research center (WUR)  Part of the Social Science Group within the WUR  We are the research part of WUR/SSG (advice ministry of Agriculture) in The Hague  Consultancy (applied research): ministries, EU, local government, industry,…  Collecting data (Farm data: FADN), building models and agricultural content specialists

3 University vs. Research center  University: teaching, publications, new theory and technology  Research center: ● applied work/consultancy ● reusing things from the past (e.g. yearly publications) ● sharing knowledge (how to become a content specialist)/teaching for small groups ● working in groups (different disciplines) ● Working in (inter)national groups with many different disciplines

4 Wietse Dol  PhD Econometrics  10 years University of Groningen (Econometrics, sampling theory) 18 years LEI (many different departments)  Data and models, i.e. use/reuse and quality, trouble shooter + statistical methods + ICT + user interfacing  Not and IT guy but a researcher (I build software because I use it myself)  Many model projects and user interfaces for models (not only LEI)  Currently: data, data quality, …

5

6 Data, lifecycle and data management http://datalib.edina.ac.uk/mantra/researchdataexplained.html http://www.dcc.ac.uk/resources/curation-lifecycle-model http://www.data-archive.ac.uk/create-manage/life-cycle

7 Data is anything and everything Research data: collected, observed, or created, for the purpose of analysis to produce and validate original research results. Anything can become the interest of research … Data Research

8 Primary v.s. Secondary data  Primary data: you collect, targeted to answer/validate your questions.  Secondary data: not yours. Quality of data Meta-information is crucial More and more need of secondary data (primary is expensive and takes a lot of time to collect).

9 Production data Meta-information: Source, Version, Dimension, Definitions etc. without proper information you use the wrong data  is FR with or without DOM?  Is the production in tons or in Euros.  Does the year start 1-1 and ends 31-12?  What’s the definition of Tomato ProductCountryYearProduction TomatoNL2005325 WheatBE1999100 SugarFR2003450

10 DCC Curation Lifecycle Model

11 CREATE & MANAGE DATA: RESEARCH DATA LIFECYCLE

12 Data  How to get the data, filter it and store it  Quality checks on the data  How to make it available for others  What scientific actions are done on the data  Curate, preserve, versions,..

13 Types of databases according MetaBase  Statistical database  Scientific database  Meta-database

14 Statistical database  Databases provided by international organizations like EU, FAO and OECD are in general statistical databases: ● Data are stored as they are received ● Data are consistent in their own domain ● No aggregations are made when underlying data are missing ● Not much attention for data checking

15 Scientific versus Statistical database  Problems with statistical database: ● Different definitions of territories and commodities ● Typing errors ● Missing data ● Break in series  Scientific database: ● Problems solved ● Transparency (original data sources and underlying assumptions are kept) ● Essential for modeling and research

16 Structural design of a scientific database  Key words for structural design HarDFACTS project IPTS 2007 done by vTI/LEI ● Transparent ● Harmonised ● Complete ● Consistent Harmonised Database for Agricultural Commodity Time Series

17 Transparent  Original data from statistical database are stored  Complete and consistent data are stored  Original and completed data can be compared  Calculation procedures are stored and can be repeated

18 Harmonised  Definition used here is to bring together the different international databases in one framework and to link the data through a unique coding system (keywords are classifications and tree structures)

19 Complete  Definition used in MetaBase is that an econometric procedures will be proposed to complete the new (time) series in the database. ● Trend estimates ● Interpolation ● Correlation and regression with other variables (e.g. TRAMO: Time series Regression with Arima noise, Missing observations and Outliers)

20 Consistent  Definition used here is that the inter relationship of the data in the database holds over classifications (time, territories and variables).

21 MetaBase

22 1. many different data sources (e.g. FAO, Eurostat) all in same user-interface (SDMX, NetCDF) 2. find data alternatives using Meta-Information 3. search data content (e.g. oilseed) 4. all content easily available in research software (R/GAMS) 5. recodings, aggregations and concordances are all implemented in GAMS 6. Statistical methods in GAMS and R

23 Thank you for your attention! Or send an email: Wietse.Dol@wur.nl


Download ppt "What is data? Wietse Dol, LEI-WUR 13 November 2012, 9.40 – 10.25, C435 Forumgebouw."

Similar presentations


Ads by Google