Presentation is loading. Please wait.

Presentation is loading. Please wait.

Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information.

Similar presentations


Presentation on theme: "Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information."— Presentation transcript:

1 Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information Centre for Economics (ZBW)

2 Outline 1.Introduction 2.Research data in economics and scientific practices 3.Thoughts on data representation 4.Repeatability of research results 5.Outlook 6.Data access and retrieval 7.Proxies and empirical models Seite 2

3 MaWiFo Project Management of Economic Research Data Seite 3

4 Seite 4 „What researchers want“ Source: Feijen (2011) Tools and services must be in tune with researchers’ workflows, which are often discipline-specific They must be easy to use “Cafeteria model”: researchers can pick and choose from a set of tools and services Benefits must be clearly visible – not in three years’ time, but now

5 Research Data as Bibliographic Artefacts Re-use Data Sharing gives more opportunities for research Citation Data acquisition and assignement of Persistent Identifiers Transparency Reproducibility: Fundamental criteria for good scientific practice Seite 5

6 Research data in economics and scientific practices Target Group: Researchers in Economics Community Building for Knowledge Exchange: Economists – Data Librarians – Computer Scientists Interviews on Data ManagementSharing Sources Publishing Processing Seite 6

7 How does Research Data look like in Economics? Seite 7

8 Interviews with Researchers in Economics Seite 8 Sources Data Agencies Statistical Offices Trusted Institutes and Researchers Data Management Own Surveys & Studies Local File System Backup Server DVD, External HD,... Processing Sharing Publishing SPSS Stata Matlab... Programming Languages High Performance Computing Execution Times: seconds, minutes, hours Within Teams Trusted Colleagues On Request (?) practiced sometimes Zip Files not included in review process 8

9 Particular Findings Research is driven by the availability of data (to some extent) Some research is based on external data, Some research is based on self-conducted studies Combining and Merging of data sets Seite 9 in average, 66% of the data comes from external sources (estimated)

10 Particular Findings Data Usage Rights – e.g. Thomson-Reuters Datastream Data Protection on-site access, virtual access sample data to understand structure analysis scripts aggregation protection maintained? Seite 10 Copy to third party?

11 ? Thoughts on Data Representation data review curation transparencyre-use repeatability Seite 11 Often, the legal situation does not allow for publishing the entire data set as was used

12 Interim Conclusion A model based on copying is insufficient We suggest fine-grained referencing single data items must be referenceable (merging, curation) highly distributable (distributed data sources) extensible (heterogeneous long tail data, curation) LOD-based approach Seite 12

13 DataSet type UserDataSet Data Items type Data Items from own survey includesData external dataset 13

14 Source Data Cube vocabulary Data Cube vocabulary StatsWales: Life Expectancy, Dataset 003311 used for our example RDF-Representation for Statistical Data 14

15 DataSetDimension label dataProperty ItemDimValue example: time X 2005-7 83.7 rdf: value A label region Cardiff B label gender Female C 15

16 Using the semantic model, referencing of data at a very detailed level is possible - without need for the data itself to be public label time X 2005-7 83.7 rdf: value A label region Cardiff B label gender Female C you can omit single information items such as the value itself, yet the data is still referenceable protected RDF-Representation for Statistical Data Challenge: Stable URIs required for every single data item 16

17 SCOVO 17

18 RDF Data Cube Vocabulary (QB) 18 source:http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html

19 Repeatability of research results Seite 19 aggregation and data cleaning ? missing values seasonal adjustment purchasing power adjustment plausibility tests basket analyses... McCullough, B. D. Got Replicability? The _Journal of Money, Credit and Banking_ Archive Econ Journal Watch, 2007, 4, 326-337 Interesting read

20 Repeatability of research results Seite 20 scripts (“do-files”) working copies of data change parameters, so that effect can be shown clearly no overall build process

21 A build script for empirical analyses  Maven-like, ANT-like Seite 21

22 DataSet type UserDataSet Data Items type Data Items from own survey includesData external dataset buildScript No gaps Trust Incentive 22

23 Communication & Architecture Seite 23 Client Digital Library Archive DArchive C Archive B Archive A DOI Reference Model Authenticate & Request Data

24 Open Challenges (practical) Researchers in economics would love to re-use data from others. Researchers in economics hesitate to share their data. Competitive advantage: “We put too much effort into data production, so we want to be the ones to publish on it.” “The code discloses too much of our know-how.” Incentives needed: Data citation Trust in research results (no gaps from data sources to results)

25 Open Challenges (technical) Precise referencing: A unique URI for every data item / table cell ? How about curation and data versioning ? Maven-like build scripts: How to specify entire system environments and software modules? Vocabulary extensions: Specific data needs specific description, where do the necessary rdf:Properties come from?

26 Summing up Reference model for exact reconstruction of research data sets Build scripts and dependency management for repeatability Transparency of data sources and processes “executable paper”, learning from others, data reviews,.... rerun analysis – with curated values – with latest data Seite 26

27 Thank you


Download ppt "Die ZBW ist Mitglied der Leibniz-Gemeinschaft Statistical Research Data on the Semantic Web SWIB 2012 Cologne, Germany Daniel Bahls Leibniz Information."

Similar presentations


Ads by Google