Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels 22-23 October.

Similar presentations


Presentation on theme: "Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels 22-23 October."— Presentation transcript:

1 Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels 22-23 October 2014

2 Cultural Heritage

3 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

4 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

5 The web The web consists of two main ingredients: a knowledge base, where knowledge is expressed informally (text) or pictorially (images, videos, graphics) and is embedded in structures such as hypertexts (HTML documents) a mechanism to access knowledge by getting the structure that contains it Conceptually, the web is based on a few, simple notions: resource: everything that has an identity and undergoes a series of states – a web resource is a structure accessible on the web URI: a string of characters that univocally identifies a resource state: the way a resource is at a certain time representation: data that encode the state of a resource – state can be represented by many different representations.

6 How it works A human can access knowledge using a web browser in few steps: 1.the user gives a URI to the browser 2.the browser asks its server to retrieve a representation of the state of the resource identified by the given URI 3. the web server complies and delivers the representation to the client 4.the client displays the obtained representation to the user

7 The web stack Based on this simple mechanism, the web has developed into a sophisticate platform for accessing services via a variety of devices:

8 The semantic web The semantic web is a parallel web, that differs from the original web only in the way the knowledge is represented. The knowledge found on the semantic web is formally represented, that is expressed in a formal language having: a machine-readable notation a formal syntax that is strongly coupled with the web architecture a formal semantics that provides a query-based access mechanism. The semantic web started as a vision by the inventor of the web: Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American Magazine, 2001. The vision is becoming true via Linked Data.

9 Linked Data Linked Data are data that follow 4 recommendations: 1.Use URIs as names for things 2.Use HTTP URIs so that people can look up those names 3.When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4.Include links to other URIs so that they can discover more things. Ingredients: language: URIs, RDF, SPARQL mechanics: HTTP look up

10 The Semantic Web stack

11 RDF The Extended Markup Language (XML) is a language for pure notation giving a set of simple rules to represent data structures The Resource Description Framework (RDF) is a contemporary version of semantic nets, allowing to express very simple statements that can be visualized as a directed, labelled graph.

12 SPARQL RDF is endowed with a query language that allows to extract knowledge from graphs: SPARQL. Which are the individuals that are listened to by someone, and which class do they belong to? SELECT distinct ?ind ?cl FROM WHERE { ?ind rdf:type ?cl. ?x ex:listen ?ind. }

13 Query answering Query answering is graph matching:

14 Vocabularies The nodes in an RDF graph are URIs of individuals which are grouped in homogeneous sets that are called vocabularies. There are known vocabularies giving URIs for: place names (such as TGN) people names (such as VIAF) concept names (such as ACM Classification scheme) etcetera

15 Vocabularies The labels in an RDF graph are URIs of properties, which capture relations between individuals. Properties also are grouped in vocabularies, such as Dublin Core CIDOC CRM etcetera If a property vocabulary includes axioms then it is called an ontology.

16 Ontologies An ontology can address any domain of discourse: social ontologies: person, fatherOf, matherOf, friendOf, … space ontologies: point, region, containedIn, … literary ontologies: text, citation, cites, … Axioms give the semantics of the relations in the ontology: social axiom: fatherOf is disjoint from matherOf space axiom: containedIn is transitive literary axiom: a citation relates a text to a work In the semantic web stack, ontologies are expressed by using the Ontology Web Language (OWL).

17 Culturage heritage The CH sector is buying massively into the semantic web languages and technologies for expressing: descriptions of CH artifacts vocabularies used in these descriptions ontologies providing properties for these descriptions The Semantic Web languages satisfy the requirements of being easy to use, tightly coupled with the web, defined in a community-based process, rich in open- source technologies.

18 An RDF description of Mona Lisa

19 A better description

20 Web Data of Increasing Standardization Not all linked data is open and not all open data is linked! ★ Available on the web (whatever format) but with an open license, to be Open Data ★★ Available as machine-readable structured data (e.g. excel vs. image scan of a table) ★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★ as (3), plus using open standards from W3C (RDF and SPARQL ) to identify things through dereferenceable HTTP URIs, to ensure effective access ★★★★★ as all the above plus establishing links between data of different sources File format Recommendations (on a scale of 0-5) csv ★★★ xls ★ pdf ★ doc ★ xml ★★★★ rdf ★★★★★ shp ★★★ ods ★★ tiff ★ jpeg ★ json ★★★ txt ★ html ★★

21 The LOD Cloud Media Government Geo Publications User-generated Life sciences Cross-domain

22 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

23 Digital Preservation Digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering and usability of authenticated content over time

24 Digital Preservation Persistence: the data survive the process that creates them Preservation: the data survive the technological and ontological changes that occur since they were persisted

25 The OAIS Reference Model

26

27

28 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

29 PRELIDA PREserving LInked DAta FP7 Coordination and support action ICT-2011.4.3 Digital Preservation Start date: January 1 st, 2013 Duration: 24 Months Funding: 770k

30 Beneficiaries Consiglio Nazionale delle Ricerche (Coord.) Alliance for Permanent Access University of Huddersfields Universitaet Innsbruck Europeana STI

31 Objectives Bridge the LD and DP communities for making the LD community aware of the existing DP results making the DP community aware of the challenges posed by LD – intrinsic features of Linked Data, including their structuring, interlinking, dynamicity and distribution.

32 Specific Objectives collect, organize and publish use cases related to the long-term access to LD create a comprehensive state of the art on LD and DP technologies set up a technology observatory bring together scientists and stakeholders for identifying relevant challenges and paths for addressing them in the near future

33 Specific Objectives perform a gap analysis between needs and tools create a roadmap making the research agenda in preserving linked data draw attention of standardization bodies

34 Workshops Opening workshop (June 25-27, 2013) – presentations – discussions – final report Midterm workshop (April 2-4, 2014) – Help defining the scientific structure Consolidation & dissemination workshop (October 17-18 2014) – present results

35 PRELIDA in action

36 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

37 Good news Making a SIP out of a LD dataset – Representation Information: plenty of ontologies and vocabularies – Structure Information: lots of standards on encoding LD – Provenance: W3C PROV – Reference: URIs – Context: Links! – … and the W3C to oversee all this

38 Challenges LD are formal knowledge – formal knowledge is for us both the content and the PDI for preserving objects (viz. OAIS information model), but how do we preserve it? the world changes our knowledge of the world changes the language that we use to express our knowledge of the world changes – how do we communicate a message via a changing language?

39 Challenges LD depend on the web infrastructure for de- referencing HTTP URIs – how do we make sure the web will keep going LD are distributed in nature – how do we manage the preservation of the interdependencies amongst datasets

40 Challenges LD are accessible in many ways: – SPARQL end-points – RDF dumps – RDF dumps plus incremental updates – RDFa – microdata etc. Which formats is best to preserve?

41 Challenges LD come with: – semantics – calculi that are sound and complete w.r.t. the semantics – inference engines that are sound and complete w.r.t. calculi Which is best to preserve?

42 Challenges Preservation requires the expression and recording of several kinds of metadata about the preserved objects. For preserving LD such metadata should be associated with RDF triples, and at the moment there is no obvious way (apart from reification) to express metadata about RDF triples. – quadruples – nested triples

43 Outline Linked Data Digital Preservation PRELIDA Challenges in preserving Linked Data Conclusions

44

45


Download ppt "Technical Challenges in the Preservation of Linked Data Carlo Meghini ISTI CNR, Pisa APA Conference Launch of the Centre of Excellence Brussels 22-23 October."

Similar presentations


Ads by Google