Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels 22-23 October.

Slides:



Advertisements
Similar presentations
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Midterm Workshop, Catania, April 2014 D3.1 State of the art assessment on Linked Data and Digital Preservation René van Horik, Data Archiving & Networked.
Semantic Web Introduction
Linked Data for Libraries, Archives, Museums. Learning objectives Define the concept of linked data State 3 benefits of creating linked data and making.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
International Workshop Linked Open Data & the Jewish Cultural Heritage Rome, 20 th January 2015 International Workshop Linked Open Data & the Jewish Cultural.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The Semantic Web – A Vision Tim Berners-Lee, James Hendler and Ora Lassila Scientific American, May 2001.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Chinese-European Workshop on Digital Preservation Beijing (China), July.
PRELIDA: An introduction Carlo Meghini ISTI CNR, Pisa.
Making Linked Data Diachronic Vassilis Christophides University of Crete & FORTH-ICS Heraklion, Crete.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Semantic Web Technologies ufiekg-20-2 | data, schemas & applications | lecture 21 original presentation by: Dr Rob Stephens
Metadata: first principles Pat Bell Knowledge, Analysis and Intelligence.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
1 SAMT’08 Semantic-driven multimedia retrieval with the MPEG Query Format Ruben Tous and Jaime Delgado Distributed Multimedia Applications Group (DMAG)
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Taking Action: Linked Data for Digital Library Managers Silvia Southwick and Cory Lampert UNLV Digital Collections American Library Association Annual.
Triple-space computing* The Third International Semantic Web Conference Hiroshima, Japan, Dieter Fensel Digital Enterprise.
The LOM RDF binding – update Mikael Nilsson The Knowledge Management.
Antoine Isaac 1 st PRELIDA Workshop Pisa, June 26, 2013.
Access and Query Task Force Status at F2F1 Simon Miles.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Dr. Lowell Vizenor Ontology and Semantic Technology Practice Lead Alion Science and Technology Semantic Technology: A Basic Introduction.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Access and Query Task Force Status at F2F1 Simon Miles.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
1cs The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Information Sharing on the Social Semantic Web Aman Shakya* and Hideaki Takeda National Institute of Informatics, Tokyo, Japan The Second NEA-JC Workshop.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Making Transparency work for you – seminar and workshop Technical implementation: How to create open, linkable data and create applications Paul Davidson,
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Linked Data Web that can be processed by machines
Building A Repository for Digital Objects
Middleware independent Information Service
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Analyzing and Securing Social Networks
PREMIS Tools and Services
LOD reference architecture
Presentation transcript:

Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels October 2014

Cultural Heritage

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

The web The web consists of two main ingredients: a knowledge base, where knowledge is expressed informally (text) or pictorially (images, videos, graphics) and is embedded in structures such as hypertexts (HTML documents) a mechanism to access knowledge by getting the structure that contains it Conceptually, the web is based on a few, simple notions: resource: everything that has an identity and undergoes a series of states – a web resource is a structure accessible on the web URI: a string of characters that univocally identifies a resource state: the way a resource is at a certain time representation: data that encode the state of a resource – state can be represented by many different representations.

How it works A human can access knowledge using a web browser in few steps: 1.the user gives a URI to the browser 2.the browser asks its server to retrieve a representation of the state of the resource identified by the given URI 3. the web server complies and delivers the representation to the client 4.the client displays the obtained representation to the user

The web stack Based on this simple mechanism, the web has developed into a sophisticate platform for accessing services via a variety of devices:

The semantic web The semantic web is a parallel web, that differs from the original web only in the way the knowledge is represented. The knowledge found on the semantic web is formally represented, that is expressed in a formal language having: a machine-readable notation a formal syntax that is strongly coupled with the web architecture a formal semantics that provides a query-based access mechanism. The semantic web started as a vision by the inventor of the web: Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American Magazine, The vision is becoming true via Linked Data.

Linked Data Linked Data are data that follow 4 recommendations: 1.Use URIs as names for things 2.Use HTTP URIs so that people can look up those names 3.When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4.Include links to other URIs so that they can discover more things. Ingredients: language: URIs, RDF, SPARQL mechanics: HTTP look up

The Semantic Web stack

RDF The Extended Markup Language (XML) is a language for pure notation giving a set of simple rules to represent data structures The Resource Description Framework (RDF) is a contemporary version of semantic nets, allowing to express very simple statements that can be visualized as a directed, labelled graph.

SPARQL RDF is endowed with a query language that allows to extract knowledge from graphs: SPARQL. Which are the individuals that are listened to by someone, and which class do they belong to? SELECT distinct ?ind ?cl FROM WHERE { ?ind rdf:type ?cl. ?x ex:listen ?ind. }

Query answering Query answering is graph matching:

Vocabularies The nodes in an RDF graph are URIs of individuals which are grouped in homogeneous sets that are called vocabularies. There are known vocabularies giving URIs for: place names (such as TGN) people names (such as VIAF) concept names (such as ACM Classification scheme) etcetera

Vocabularies The labels in an RDF graph are URIs of properties, which capture relations between individuals. Properties also are grouped in vocabularies, such as Dublin Core CIDOC CRM etcetera If a property vocabulary includes axioms then it is called an ontology.

Ontologies An ontology can address any domain of discourse: social ontologies: person, fatherOf, matherOf, friendOf, … space ontologies: point, region, containedIn, … literary ontologies: text, citation, cites, … Axioms give the semantics of the relations in the ontology: social axiom: fatherOf is disjoint from matherOf space axiom: containedIn is transitive literary axiom: a citation relates a text to a work In the semantic web stack, ontologies are expressed by using the Ontology Web Language (OWL).

Culturage heritage The CH sector is buying massively into the semantic web languages and technologies for expressing: descriptions of CH artifacts vocabularies used in these descriptions ontologies providing properties for these descriptions The Semantic Web languages satisfy the requirements of being easy to use, tightly coupled with the web, defined in a community-based process, rich in open- source technologies.

An RDF description of Mona Lisa

A better description

Web Data of Increasing Standardization Not all linked data is open and not all open data is linked! ★ Available on the web (whatever format) but with an open license, to be Open Data ★★ Available as machine-readable structured data (e.g. excel vs. image scan of a table) ★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★ as (3), plus using open standards from W3C (RDF and SPARQL ) to identify things through dereferenceable HTTP URIs, to ensure effective access ★★★★★ as all the above plus establishing links between data of different sources File format Recommendations (on a scale of 0-5) csv ★★★ xls ★ pdf ★ doc ★ xml ★★★★ rdf ★★★★★ shp ★★★ ods ★★ tiff ★ jpeg ★ json ★★★ txt ★ html ★★

The LOD Cloud Media Government Geo Publications User-generated Life sciences Cross-domain

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

Digital Preservation Digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering and usability of authenticated content over time

Digital Preservation Persistence: the data survive the process that creates them Preservation: the data survive the technological and ontological changes that occur since they were persisted

The OAIS Reference Model

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

PRELIDA PREserving LInked DAta FP7 Coordination and support action ICT Digital Preservation Start date: January 1 st, 2013 Duration: 24 Months Funding: 770k

Beneficiaries Consiglio Nazionale delle Ricerche (Coord.) Alliance for Permanent Access University of Huddersfields Universitaet Innsbruck Europeana STI

Objectives Bridge the LD and DP communities for making the LD community aware of the existing DP results making the DP community aware of the challenges posed by LD – intrinsic features of Linked Data, including their structuring, interlinking, dynamicity and distribution.

Specific Objectives collect, organize and publish use cases related to the long-term access to LD create a comprehensive state of the art on LD and DP technologies set up a technology observatory bring together scientists and stakeholders for identifying relevant challenges and paths for addressing them in the near future

Specific Objectives perform a gap analysis between needs and tools create a roadmap making the research agenda in preserving linked data draw attention of standardization bodies

Workshops Opening workshop (June 25-27, 2013) – presentations – discussions – final report Midterm workshop (April 2-4, 2014) – Help defining the scientific structure Consolidation & dissemination workshop (October ) – present results

PRELIDA in action

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

Good news Making a SIP out of a LD dataset – Representation Information: plenty of ontologies and vocabularies – Structure Information: lots of standards on encoding LD – Provenance: W3C PROV – Reference: URIs – Context: Links! – … and the W3C to oversee all this

Challenges LD are formal knowledge – formal knowledge is for us both the content and the PDI for preserving objects (viz. OAIS information model), but how do we preserve it? the world changes our knowledge of the world changes the language that we use to express our knowledge of the world changes – how do we communicate a message via a changing language?

Challenges LD depend on the web infrastructure for de- referencing HTTP URIs – how do we make sure the web will keep going LD are distributed in nature – how do we manage the preservation of the interdependencies amongst datasets

Challenges LD are accessible in many ways: – SPARQL end-points – RDF dumps – RDF dumps plus incremental updates – RDFa – microdata etc. Which formats is best to preserve?

Challenges LD come with: – semantics – calculi that are sound and complete w.r.t. the semantics – inference engines that are sound and complete w.r.t. calculi Which is best to preserve?

Challenges Preservation requires the expression and recording of several kinds of metadata about the preserved objects. For preserving LD such metadata should be associated with RDF triples, and at the moment there is no obvious way (apart from reification) to express metadata about RDF triples. – quadruples – nested triples

Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions