Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen.


Similar presentations
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Linked Library Data Miiya Holmes October 6-7, 2012.
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.
UKOLN is supported by: A non-technical introduction to: OAI-ORE ( Defining Image Access project meeting.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Introducing Symposia : “ The digital repository that thinks like a librarian”
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
The OAI-ORE based data model of Europeana and the Digital Public Library of America: implications for educational publishing Dov Winer MAKASH – Advancing.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
RMap Project RDA Fourth Plenary Amsterdam 23 September 2014 Sayeed Choudhury, Data Conservancy Sheila Morrissey, Portico.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
In Dublin’s fair city, where the metadata are so pretty… John Roberts Archives New Zealand.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Cross-linking and Referencing Data and Publications in CLADDIER Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory.
VIVO and Scholarly Repositories: Synergistic Opportunities.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Connecting components that graph the “new” article Gerry Grenier Senior Director IEEE, Inc.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
CNI Task Force Meeting April 7, 2008 OAI-ORE Project Briefing David Reynolds Tim DiLauro Sayeed Choudhury Library Digital Programs Sheridan Libraries Johns.
Data Citation Implementation Pilot Workshop
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
NRF Open Access Statement
RDA WG on Dynamic Data Citation
Research Developer, Portico
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization Sara Studwell Department.
Justin Buck OceanSITES data Incentives for participation: Data citation & data services Justin Buck
Packaging Specification Package Ingest Service
International Congress of Entomology, Orlando
An Overview of Data-PASS Shared Catalog
Persistent Identifiers Implementation in EOSDIS
The Hosted Model Charl Roberts Good morning again,
Publishing software and data
Jenn Riley Metadata Librarian Digital Library Program
CFI John R Evans Leaders Fund Digital Data Management
Topic J: Gathering evidence 3. Strategic paper gathering
An Architecture for Complex Objects and their Relationships
Linking persistent identifiers at the British Library
VI-SEEM Data Repository
Matthew Harp Arizona State University
Outline Pursue Interoperability: Digital Libraries
Managing ETDs with Associated Complex Digital Objects
What’s New in Colectica 5.3 Part 1
Publications and Research Data – crosslinking repositories
Data Management: Documentation & Metadata
Open Access to your Research Papers and Data
Linked Data for SDG Reporting
OpenML Workshop Eindhoven TU/e,
New Directions in Faculty Work
An OAI-ORE Aggregation for the National Virtual Observatory
NSDL Data Repository (NDR)
CSCD 506 Research Methods for Computer Science
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Research Data Management
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Web archives as a research subject
Jenn Riley Metadata Librarian Digital Library Program
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Using RMap to Describe Distributed Works as Linked Data Graphs: Outcomes and Preservation Implications iPres. Bern, Switzerland, October 5, 2016 Karen Hanson, Data Conservancy

View of a scholarly work AUTHORS by cites ARTICLE Concept of a scholarly work has shifted. Used to be mostly focused on articles. Citation lists would point to other articles. ARTICLES

View of a scholarly work output of CODE DATASET from by used AUTHORS by cites BIG DATASET DEVELOPER ARTICLE cites by Now more and more articles citing webpages, which are much more fragile. Also, with pressure from funders and publishers also need to provide more evidence for reproducibility. So the map is starting to get complex. These components are published in different places at different times. Note that this map is oriented towards a scientific research example, but it would be easy to imagine the distributed components for some of the other examples from this conference such as other works that can be expressed with many components… such as interactive art pieces, computer games, etc. described in other sessions ARTICLES WEBPAGES ORG

Scholarly works continually evolve Adding to the complexity is the ideas laid out in this OCLC report. Digital objects are produced throughout the research process, and this does not end at the time of publication. A static linear citation list at the end of a published article is a constraint considering the fact that a scholarly work is distributed in time and space. Lavoie, Brian, Eric Childress, Ricky Erway, Ixchel Faniel, Constance Malpas, Jennifer Schaffner, and Titia van der Werf. 2014. The Evolving Scholarly Record. Dublin, Ohio: OCLC Research. Lavoie, Brian et al. 2014. The Evolving Scholarly Record. Dublin, Ohio: OCLC Research.

How can we capture this evolving context?

Maps of resources output of from by used by cites cites by CODE CODE DATASET from by used AUTHORS by Robert Smith cites BIG DATASET DEVELOPER ARTICLE cites by You can think of this diagram as a map of resources. Different kinds of identifiers, distributed in different kinds of repository. ARTICLES WEBPAGES ORG

(Distributed Scholarly Compound Object) 21st Century DiSCO Encode as RDF (linked data) Something that’s good at describing maps of resources – the resource description framework. What RMap does is allow you to store this map as an RDF graph. We call these graphs “DiSCOs” short for Distributed Scholarly Compound Object. RMap DiSCO (Distributed Scholarly Compound Object)

RMap in a nutshell A REST API service for managing and retrieving the maps of relationships amongst distributed scholarly works. Just to summarize…

As RDF the graph looks like this As RDF the graph looks like this. If you submit this to RMap it assigns and identifier so that you can use that to retrieve it later. This is difficult for the human eye to parse, so…

Here is the pretty version Here is the pretty version. This is our web gui representation of a DiSCO.

Some features of a DiSCO

Describes an aggregation of 1 or more scholarly resources ARTICLE DATASET A disco describes an aggregation of one or more scholarly resources, in this case an article and a dataset

Based on OAI-ORE*, but simpler NAMESPACES @prefix ore: <> . @prefix rmap: <> . <ark:/00000/akfjlsdkf> a rmap:DiSCO ; ore:aggregates <>, <> . DISCO PID TYPE This is the simplest disco you could create. The identifier is generated by RMap AGGREGATED RESOURCES * Open Archive Initiative – Object Reuse and Exchange

Can include additional assertions encoded as a RDF In addition to this basic information you can add as many additional assertion as you like. They just need to be encoded as RDF, and form a connected graph with the aggregated resources. You may use this to define the types for the objects in the aggregates list, describe their exact relationship and add any other context as relevant. There is no restriction on ontologies used

Provenance and status captured in RMap Event Each time a DiSCO is created or updated, provenance is captured in an RMap Event

DiSCOs are version-able, have status INACTIVE INACTIVE ACTIVE DiSCO V1 DiSCO V2 DiSCO V3 DiSCOs are also versionable. You can always retrieve previous versions or ask the system for the latest. Each version gets a new identifier. When you tell the system to create a new version, the previous version is marked as inactive. Get Latest

The ideal version of a DiSCO I showed this diagram, this is an ideal version of a DiSCO. If you are purely harvesting metadata, you don’t see this in the wild. Distributed Scholarly Compound Object

Relationship metadata is scattered, incomplete, and created at different times Instead the metadata, like the underlying objects is scattered, incomplete and created at different times. It is difficult to coordinate the proliferation of this context.

DiSCOs can be connected through a shared URI AGENTS JOURNAL creators partOf ARTICLE ARTICLE citedBy Publisher Metadata In the case of RMap it’s OK if this metadata is piecemeal, because linked data allows you to join it up based on shared identifiers. So if a publisher gives you this view…

DiSCOs can be connected through shared URIs AGENTS DataCite Metadata creators ARTICLE supplementOf And then DataCite gives you this view… DATASET derivedFrom cites WEBPAGE BIG DATASET

Combined view JOURNAL creators AGENTS partOf creators citedBy creators partOf creators ARTICLE ARTICLE citedBy supplementOf Through shared URIs they join together. DATASET derivedFrom cites WEBPAGE BIG DATASET

RMap allows you to view all links for a resource AGENTS JOURNAL creators partOf ARTICLE ARTICLE citedBy supplementOf RMap allows you to ask it everything it knows about a particular URI and it will returned the combination of all DiSCOs DATASET

This is how it looks through the RMap GUI This is how it looks through the RMap GUI. Notice the list of DiSCOs on the right that make up this visual

Preservation use cases So what does this have to do with preservation?

This map represents a preservation challenge output of CODE DATASET used from AUTHORS by by DEVELOPER cites ARTICLE BIG DATASET cites This map of distributed resources represents a preservation challenge by ARTICLES WEBSITES ORG

Institutional Archive Preservation is often distributed, asynchronous, incomplete, with no repository having full context. Institutional Archive DATASET e.g. Portico ARTICLE Currently there are repositories that specialize in preservation of specific types of material, and that’s unlikely to change any time soon. It is unclear what the coverage of these is, and each holds a piece of the context of the work. ARTICLES WEBSITES e.g. Internet Archive

RMap can help add context around each research object Institutional Archive CODE DATASET AUTHORS e.g. Portico ARTICLE BIG DATASET DEVELOPER What RMap could do is help fill in that context for each piece and help determine how these works fit together. In fact, the RMap DiSCOs could be valuable to preserve as well. ARTICLES WEBSITES e.g. Internet Archive ORG

Expose the location of preserved copies FILES hasParts ARTICLE PRESERVED ARTICLE isFormatOf location PORTICO status Another possible preservation use case would be to give archives an opportunity to identify what parts of the map are being preserved. For example, Portico could provide graphs that show which articles they have preserved copies of and what files are included. <>

It could support discovery of preservation gaps Institutional Archive CODE DATASET AUTHORS e.g. Portico ARTICLE BIG DATASET DEVELOPER This could help identify possible preservation gaps in these distributed works. ARTICLES WEBSITES e.g. Internet Archive ORG

… perhaps be used to trigger preservation events AGENTS JOURNAL creators partOf WARC it! creators ARTICLE ARTICLE citedBy supplementOf RMap data could also be used to trigger preservation events. Say a map is created with a website reference, this could trigger a web archive to preserve it. That archive could then add it’s own DiSCO containing the new archived link. DATASET derivedFrom cites WEBPAGE BIG DATASET

Distribute context metadata across repositories Preservation Services Publishers Researchers Registries Finally, RMap could be used in general to help harmonize metadata across multiple platforms. Funders Data Repositories Academic Institutions

Status Working prototype REST API, web GUI, harvesting tools Code is on GitHub ( Public sandboxes available (see tech wiki for docs) Work continues… Here is the current status. There was some buzz at the conference about overlaying dinosaurs onto long tail graphs… my graph doesn’t have a long tail so I’ve chosen a pterodactyl.

Acknowledgements RMap is funded by the Alfred P. Sloan Foundation Thanks to RMap project colleagues: JHU: Sayeed Choudhury (PI), Aaron Birkland, Tim DiLauro Portico: Kate Wittenberg (PI), Sheila Morrissey, Jabin White, Vinay Cheruku, Amy Kirchhoff, John Meyer, Stephanie Orphan, Joseph Rogowski IEEE: Gerry Grenier (PI), Mark Donoghue, Renny Guida, Ken Rawson, Ken Moore. (general info) (tech wiki) @rmapproject | @karenhansn |