Presentation is loading. Please wait.

Presentation is loading. Please wait.

UKDS DDI Overview Darren Bell Repository Architect Schloss Dagstuhl 17 Oct 2016.

Similar presentations


Presentation on theme: "UKDS DDI Overview Darren Bell Repository Architect Schloss Dagstuhl 17 Oct 2016."— Presentation transcript:

1 UKDS DDI Overview Darren Bell Repository Architect Schloss Dagstuhl 17 Oct 2016

2 UKDS DDI in a nutshell 7000 datasets (mostly microdata) disseminated as SPSS & STATA Zip bundles Approx 600 of these datasets in Nesstar Webview in DDI1.2.1 (created in Nesstar Publisher) Study level metadata for all datasets delivered as DDI2.5 directly through Discover portal and indirectly through OAI provider Pilot for DDI3.1 in 2012 but little progress since then Our data is still locked away in binary formats and doing nothing for the next 5 years is not an option.

3 Barriers to DDI adoption DDI-L perceived as too difficult for mere mortals with a bloated and over-engineered specification DDI4: Little understanding of RDF and UML constructs within the Archive at the moment, but that is changing No obvious appetite among producers for supplying DDI- L (except ONS perhaps) Not clear how DDI could support some administrative metadata and processes e.g. Licensing, IPR and granular Access criteria From Big Data perspective, too domain-specific (although we are here with high hopes for DDI4 beng able to address this) Support for qualitative data feels underwhelming (we use in-house QuDEx)

4 Incentives to DDI adoption Enter the Big Data Network Support Team at the UKDA 3 year roadmap to migrate to a Hadoop open-source ecosystem Business-as-usual will continue with SPSS, SQL,.Net and DDI as an output format But finally, we are considering DDI4 at the storage layer Difficulties with ODI Certification forced us to rethink data management from ground up. The “semantic data lake” more than a buzzword and allows us to normalise and analyse multiple data shapes at massive scale in Hadoop

5 Interoperability challenges “Traditional” microdata, tabular and cubes, yes, we know what to do with that (probably..) Long, thin and boring e.g. energy sensor data or log data Data in motion, e.g. social media feeds, or log data Unstructured data e.g. video, audio (more in quali arena?) Biomedical and other domains in unfamiliar formats Other schema include datacite xsd (OpenAire) We want to flatten all of this in our “data lake” All our developers want is JSON, JSON and more JSON. They hate XML.

6 What we think are our options in order of masochism descending. Do our own super lightweight profile with DC and SKOS starting from first principles at the datapoint. Completely decouple the semantics. Using GSIM as a basis, design a domain agnostic schema DDI4, hence being here

7 Questions dbell@essex.ac.uk


Download ppt "UKDS DDI Overview Darren Bell Repository Architect Schloss Dagstuhl 17 Oct 2016."

Similar presentations


Ads by Google