When Open Access meets Open Data Christopher Gutteridge
Christopher Gutteridge Systems developer as University of Southampton for 18 years. (was) Lead developer on EPrints v2 & 3 Founder data.ac.uk Winner 2012 THE Higher Award for Outstanding ICT Innovation of the year (for data.southampton.ac.uk)
data.southampton.ac.uk Provides datasets for anything and everything we can that’s not a secret Buildings, rooms, people, courses, services, events… Provide tools and webpages based on this data
Gaining Value from Open Data Consuming Publishing Augmenting Aggregating Reducing barriers for your own staff
BIG DATA
to be processed by normal methods. Big Data BIG Data which is too to be processed by normal methods.
to be processed by normal methods. Big Data Data which is too Rapid to be processed by normal methods.
to be processed by normal methods. Big Data Data which is too HeterOgeneous to be processed by normal methods.
Needless heterogeneity can derail research Too much effort to align datasets. Not possible to make reusable tools CC-BY photo by Flickr user duncanh1
Research Dataset Metadata Research Output Data Subject-specific metadata Bibliographic Metadata Catalogue Metadata
Research Dataset Metadata spreadsheet of results Methodology, data format Title, Author, Institution Last modified, Uploaded by
Institutional (Data) Repositories Metadata Can be interrogated Research Output Data Subject-specific metadata Bibliographic Metadata Catalogue Metadata Data Can only be downloaded
Subject Specific Repositories Metadata Can be interrogated Research Output Data Subject-specific metadata Bibliographic Metadata Catalogue Metadata Data Can only be downloaded
Open data provides a compromise Distributed storage in Various Repositories Services aggregate subject-specific metdata to provide enhanced discovery of open datasets Multiple competing discovery services can help drive innovation Discoverability of subject-specific metadata is essential
Automatically discovers equipment data from all .ac.uk sites 2769 websites Automation massively reduces staffing costs Low effort for institutions- A third just provide a well-structured spreadsheet! Not a single-point-of-failure
.ac.uk
.ac.uk
Greater than the sum of its parts…
$ ./generate-world Demo --postcode PO381NL --size 250
$ ./generate-world Demo --postcode PO381NL --size 250
Photo by Flickr user: Krissen Scratch 1 : Photo by Flickr user: Krissen CC-BY-NC-ND
Scratch 2 Enabling researchers to scratch an itch
Discoverable for researchers Does data about X exist? How do I get it? How do I interpret it? Datasets need to be discoverable in ways appropriate to the research area
Leading a horse to water Academics are starting to feel the pressure to publish data. This means many will be begrudgingly doing the minimum required.
Making them drink Esteem benefits Immediate benefits want to Esteem benefits Give academic credit for dataset citations Immediate benefits Provide analysis tools in the repository Automated analysis tools Publishing data becomes part of the workflow
Automation Automated checking of reproducibility Find all the datasets my new method could analyse Autodetect new candidate datasets, auto- generate and publish analysis An immediate benefit to the data creator Who gains the research credit if there are automated steps?
Summary We should be fostering interoperability of data Make data discoverable for researchers Help them satisfy their curiosity One size does not fit all But some solutions will fit many cases
How does what we are doing make research better?