Download presentation
Presentation is loading. Please wait.
Published byMolly Bates Modified over 10 years ago
1
Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz, Austria
2
Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 2
3
Repeatability in Science Fundamental criterion – to verify is the job of the community Experiments must lead to the same findings different researchers under certain constant parameters Further Robustness (w.r.t measuring errors, etc.) Repeatability vs. Reproducibility vs. Verifiability Seite 3
4
Repeatability in Economics and the infamous case of Rogoff and Reinhard Seite 4
5
Improving Review Processes Seite 5 - Justin Wolfers, Betsey Stevenson, economists at University of Michigan....so we need access to the data If we try it all on our own and cannot reproduce the results, what does it mean?
6
McCullough – Experiences & Recommendations Seite 6
7
McCullough – Requirements & Experiences Seite 7
8
McCullough – Requirements & Experiences Seite 8
9
Sweave – Literate Programming for Statistics Seite 9
10
Sweave – Literate Programming for Statistics Seite 10
11
Data Publishing in Economics / Social Sciences Different disciplines have different challenges Characteristics of empirical research: sensitive / protected data distributed external data sources Seite 11 Data Sharing submit data bundles to 3 rd -party repositories?
12
? Data Management The Black Box Approach data review curation legal situation re-usetransparency repeatability Seite 12 a data set copy (some resource bundle)
13
Statistical Data on the Semantic Web Seite 13
14
Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 14
15
Data Restore Model Seite 15 Spreadsheet obsdata set
16
Data Restore Model Seite 16 Spreadsheet obsdata set
17
DataSet type UserDataSet Data Items type Data Items from own survey includesData external dataset buildScript No gaps Trust Incentive 17
18
Seite 18 Source: EuroStat Dataset: Household XZ Version: 0.2 Published: Jan 2009 [read more]
19
Integration with Research Environments Seite 19
20
Seite 20
21
Review and Re-use Seite 21 Client Source Code Repository Archive DArchive C Archive B Archive A DOI Code and Data Templates Authenticate & Request Data
22
Data Infrastructure Concept One source per data set transparency, curation by highest expertise Data protection make data publishing possible for all scenarios Data and code integration one-click-solution – no manual efforts for replication attempts Precise Citation traceable data provenance Seite 22
23
Incentives for the Research Community Transparency increases trust: no gaps – trust – incentive Easy re-use: the research models applied live longer More impact: more citation Seite 23
24
Incentives for the Research Community Material for tutorials: Students learn computational research in practice Research is more efficient: Easier to understand and pick up the research of others Secured Knowledge: Replication attempts in different research environments and context discussion, inspiration, innovation Non-Findings may get more recognition Seite 24
25
Outline 1.Motivation – Repeatability in Empirical Research 2.Our Approach– The Data Restore Model 3.Outlook– Status of this Work / Next Steps Seite 25
26
What we are currently working on Seite 26 The Rogoff and Reinhard / Herndon case apply Data Restore Model add semantic data documentation (partly available as RDF already) model by Data and Code ontology
27
Data and Code Ontology Seite 27 Data and Code System Environment Resources HW SW Replication Attempts Experiment Setup Maven Make Build Virtualisation Emulation Linked Science Social Media Data References Semantic Coding?
28
What we are currently working on Seite 28 The Koenker Zeileis case Model relations between Data and Code instances protected public use file figures data set transformation by code The Koenker Zeileis case
29
Data Access and Retrieval
30
Next Steps Seite 30 1.Challenge, Goals, Requirements 2.The Data Restore Model 3.Semantic Linkup / Data Annotation 4.Data Retrieval and Reuse 5.System Architecture 6.Validation / Evaluation
31
Thank you Daniel Bahls, ZBW d.bahls@zbw.eu d.bahls@zbw.eu
32
So there are still gaps Examples: data set is titled EU Unemployment statistics 2012, EuroStat age class? seasonal adjustments? Executing the code does not produce the results wrong data? system environment? error? cf. Herndons replication of Rogoff/Reinhard research DOI does not specify file format Seite 32
33
Data and Code Ontology Seite 33 observationstring value spo data ref default value for_stata for_spss
34
Such relationship can be stated within the semantic model Proxy Relations Dataset for economic growth (GDP or the like) Dataset for Aluminium Price Index Describes the proxy relation: - details on correlation - best practices - frequency of use -... hasProxyRel
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.