Automatically Calculating the Adherence to License Requirements

Slides:



Advertisements
Similar presentations
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Advertisements

1 Introduction to Database Management Systems Lila Rao Graham.
Methodologies for Web Information System Design
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Presented by DOI Create: TERN as a use-case Siddeswara Guru
A socio-technical model for content sharing
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
Implementing DOIs for Data DataPool supporting institutional service development Simon Coles, Andrew Milsted, Wendy White Jisc Managing Research Data Programme.
References: [1] [2] [3] Acknowledgments:
Figure 1: Location of the bioregions being assessed Objectives The objectives of the bioregional assessments are to:
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
IESR, A Registry of Collections and Services: Using the DCMI Collection Description Profile in Practice Ann Apps MIMAS, The University of Manchester, UK.
Aalto Research Data Management Policy Ella Bingham 8 April 2016 This work is licensed under the Creative Commons Attribution 4.0 International License.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number Gry Henriksen.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Workforce Repository & Planning Tool
NRF Open Access Statement
Exploring Creative Commons Licenses for Scholarly Metadata Records
Environment Live Data and Services to GEO Assessment
Software Project Configuration Management
Data Reuse Fitness Assessment Using Provenance
OGF PGI – EDGI Security Use Case and Requirements
Agreeing about agreements: modelling social contracts, people and data
PLOS Facilitating Text & Data Mining The Role the Publisher Can Play
EOSC MODEL Pasquale Pagano CNR - ISTI
Running a Privacy Impact Assessment (PIA)
Pasquale Pagano CNR – ISTI (Pisa, Italy)
Introduction to Database Management Systems
FAIR Metadata RDA 10 Luiz Olavo Bonino – - September 21, 2017.
Knowledge Representation Part I Ontology
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Ingestion in ENES and collaboration with RDA
Building a CMMI Data Infrastructure
Ontology From Wikipedia, the free encyclopedia
Introduction to Databases
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
CREATIVE COMMONS FOR CULTURAL HERITAGE
Data Foundation and Terminology (DFT) Vocabulary Development Session
W. Christopher Lenhardt
Research data finder Etsin
SMART GROUND platform overview
Horizon 2020: Open data pilots and lessons learnt
Impacts of coal seam gas extraction on water resources in Australia
Nicholas Car Scientific Data Platforms & Policy
Four Levels of Data from Ricardo’s Database Illuminated
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Software Architecture
Creating a Culture of Open Data in Academia
Introduction to the PRISM Framework
United Nations Statistics Division
Introduction to Databases
Lecture 06:Software Maintenance
Chapter 2 Database Environment Pearson Education © 2014.
Bird of Feather Session
W3C Recommendation 17 December 2013 徐江
Agenda Software development (SD) & Software development methodologies (SDM) Orthogonal views of the software OOSD Methodology Why an Object Orientation?
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Australian and New Zealand Metadata Working Group
Rebecca Nyman Service Designer Gordon Williamson Product Manager
Database management systems
Presentation transcript:

Automatically Calculating the Adherence to License Requirements Nicholas J. Car Geoscience Australia nicholas.car@ga.gov.au Matthew P. Stenson CSIRO matthew.stenson@csiro.au Do you agree to something undefined and untestable before viewing this presentation? I agree Continue

Automatically Calculating the Adherence to License Requirements Nicholas J. Car Geoscience Australia nicholas.car@ga.gov.au Matthew P. Stenson CSIRO matthew.stenson@csiro.au Do you agree to something undefined and untestable before viewing this presentation? X I agree Continue

Motivation Reduce the management effort in long-tail data repositories Reduce risk of license condition breaches Handle licence entailment better Currently handled poorly, all or nothing

About the authors Both apply computer science & engineering to data management < 2015 Both part of a government research institute: CSIRO 2015 Both built a data sharing repo described here: BA Repo 2016 Nicholas moves to a data custodian govt. department: GA Nicholas Car Matthew Stenson

Objectives As much as possible, automatically handle datasets according to their license conditions Prompt for user actions for un-handlable conditions

Bioregional Assessments Programme Multidisciplinary program of scientific work, assessing impacts of coal and coal seam gas mining on water resources. Several thousand datasets both collected and generated by the BAP, stored long-term, enabling reuse over time. ‘Derived’ datasets – generated by BAP – licenced for open access and reuse – CC-BY 3.0. ‘Source’ datasets from mining companies, community groups and government agencies have a range of licences and restrictions. http://www.bioregionalassessments.gov.au

Bioregional Assessments Programme Original dataset licensing methodology Manually collect licence information for each Source dataset Store the info in the catalog of datasets Handle license conditions based on classes Present a stacked attribution statement for each dataset, based on its ancestors: © CSIRO 2011 © Commonwealth of Australia (Bureau of Meteorology) 2012 © Commonwealth of Australia (Bioregional Assessments) 2015 Resulted in 60+ distinct licences

Bioregional Assessments Programme Original dataset licensing methodology Manually collect licence information for each Source dataset Store the info in the catalog of datasets Handle license conditions based on classes Present a stacked attribution statement for each dataset, based on its ancestors: © CSIRO 2011 © Commonwealth of Australia (Bureau of Meteorology) 2012 © Commonwealth of Australia (Bioregional Assessments) 2015 Most restrictive wins Usually all or nothing

Bioregional Assessments Programme Original dataset licensing methodology Manually collect licence information for each Source dataset Store the info in the catalog of datasets Handle license conditions based on classes Present a stacked attribution statement for each dataset, based on its ancestors: © CSIRO 2011 © Commonwealth of Australia (Bureau of Meteorology) 2012 © Commonwealth of Australia (Bioregional Assessments) 2015 Was recorded manually Requires a knowledge of dataset provenance Could be automatic, could be useful in other ways

Hypothesis By implementing a faceted license handling system, we can more automatically deliver datasets according to their particular conditions

Methods - Provenance Provenance: “informatics uses the term 'provenance' to mean the lineage of data, as per data provenance, with research in the last decade extending the conceptual model of causality and relation to include processes that act on data and agents that are responsible for those processes” Wikipedia, https://en.wikipedia.org/wiki/Provenance#Computer_science Provenance graph of dataset is the point of truth regarding dataset relations and therefore dependent licence relations I’m using the PROV Data Model1 1 https://www.w3.org/TR/prov-dm/ dataset dataset dataset dataset prov:wasDerivedFrom

Methods - Provenance Associate licenses, as distinct objects, with datasets Reuse! licence licence dataset dataset dataset dataset licence

Methods - Provenance req req req Associate license requirements as distinct objects with licenses Reuse! licence licence dataset dataset dataset dataset licence req

Methods - Provenance req req req Calculate requirements automatically licence licence dataset dataset dataset dataset licence req

Faceted License model Requirements The Creative Commons licence information model1 has a “Requirement” class that a licence can point to Requirement class is “an action that may or may not be requested of you” This is a sub-license unit 1 http://creativecommons.org/ns cc: Requirement cc:requires cc:Licence dc:license dcat:dataset

Faceted License model Requirements BA defined a series of Requirement instances1 1 http://data.bioregionalassessments .gov.au/id/requirement/ cc: Requirement cc:requires cc:Licence dc:license dcat:dataset

ba: RequirementResolution Faceted License model Requirements Resolution BA defined a series of Requirement Resolutions These resolve particular Requirements RRs are an Entity, like a dataset ba:resolves cc: Requirement ba: RequirementResolution cc:requires cc:Licence dc:license dcat:dataset

ba: RequirementResolution Faceted License model Generating Requirements Resolutions In PROV1 Entities are generated by Activities This suits our needs 1 https://www.w3.org/TR/prov-dm/ ba:resolves cc: Requirement ba: RequirementResolution cc:requires prov: generated cc:Licence prov:Activity dc:license dcat:dataset

Faceted license model

Faceted license model ba:resolves cc:requires prov:generated Org: Organization cc: Requirement ba: RequirementResolution cc:requires prov:generated odrs:RightsStatement cc:Licence prov:Activity dc:license dcat:dataset

A real BA example

A real BA example

A real BA example

BA Findings Handling Requirements individually and citations separately reduces licences from 100s (later 60+) to about 10 There are very few distinct Requirements Perhaps 10 Most Requirements can be resolved by data store actions Show notices Limit access Some Requirements can be resolved only by one-time deliberate action Removing/lumping/de-identifying/averaging Resolution status of all Requirements for all datasets can be calculated easily using a provenance graph + metadata

Ongoing Work Data Centre/Agency-based and ongoing GA is applying this approach “forever” Implementing registers for multiple uses Licenses Requirements Organizations Requirement Resolutions Activities that generate RRs Systems that perform the Activities

Future Work Further normalise Requirements to cater for identical actions, different data (e.g. attribution) Make a project-independent licencing data model Ontology Extension to PROV & CC Publish real Requirement & RequirementResolution objects As demonstrators For others to use directly (e.g. “show notice”) Have this reviewed by lawyers Modelling other forms of Agreement: see “Agreeing about Agreements” in the session “Getting the incentives right”