Download presentation
Presentation is loading. Please wait.
Published byRoland Patrick Modified over 8 years ago
1
Preservation Through Evolution Management: The DIACHRON Approach DIACHRON Final Dissemination Workshop 24.03.2016 Giorgos Flouris (FORTH) fgeo@ics.forth.gr
2
Preservation and Evolution Management Two sides of the same coin – Understanding evolution allows preservation Preservation through change detection – Terminology changes (e.g., Yugoslavia) – Modelling changes (e.g., Pluto is a Planet) Trace back our understanding at a given point in time, by “reverse engineering” changes Equivalent to keeping the old versions, but: – Cheaper (in terms of space) – Helps understand (not just access) older versions
3
DIACHRON Architecture
4
Change Detection
5
Change Detection Challenges Change detection for evolution management – Identifying changes between versions Challenges – Going beyond simple “delta” solutions – High-level deltas – More intuitive lists of changes – Without loss of formal rigor
6
Additional challenges in DIACHRON Change detection challenges (in DIACHRON) 1.Diverse data models 2.Dynamic datasets 3.Recoverable versions 4.Changes as first-class citizens 5.Cross-snapshot queries
7
Change Detection in DIACHRON Pilot datasetDIACHRON Version 1 Pilot datasetDIACHRON Version 2 Change
8
Defining Changes: Layers What a naïve diff will report … Add (Rec, diachron:subject, EFO_001927) Add (Rec, diachron:hasRecordAttribute, rAtt1) Add (rAtt1, diachron:predicate, rdfs:subClassOf) Add (rAtt1, diachron:object, ObsoleteClass) … what the pilot expects … Add_SuperClass (EFO_001927, ObsoleteClass) … or better still … Mark_as_Obsolete (EFO_001927) Low-level Simple Complex Universal Model-specific User-specific
9
Change Hierarchy: Low-level (1/3) Low-level changes – DIACHRON model, for internal use – Fixed: Add, Delete – Just additions and deletions of triples – Simple set difference
10
Change Hierarchy: Simple (2/3) Pilot terminology: – Add_SuperClass Add_Dimension Fixed, pre-defined Comprising of low-level changes Partitioning is perfect – Complete and unambiguous
11
Change Hierarchy: Complex (3/3) Pilot terminology: – Add_Synonym, Mark_As_Obsolete Totally custom, pilot-specific (defined at run-time)
12
Detecting Changes Add_SuperClass (simple) INSERT INTO { ?asc a co:Add_Superclass; co:asc_p1 ?a; co:asc_p2 ?b. } WHERE { GRAPH { ?r diachron:subject ?a; diachron:hasRecordAttribute ?ratt. ?ratt diachron:predicate rdfs:subClassOf; diachron:object ?b. } FILTER NOT EXISTS { GRAPH { ?r diachron:hasRecordAttribute ?ratt. ?ratt diachron:predicate rdfs:subClassOf; diachron:object ?b. } FILTER NOT EXISTS { GRAPH { {?assoc1 co:new_value ?a.} UNION {?assoc2 co:new_value ?b.} } BIND(IRI('v1') as ?v1). BIND(IRI('v2') as ?v2). BIND(concat(str(?a), str(?b), str(?v1), str(?v2)) as ?url). BIND(IRI(CONCAT('http://asc/',SHA1(?url))) AS ?asc). } Mark_as_Obsolete (complex) INSERT INTO { ?mao a co:Mark_As_Obsolete; co:mao_p1 ?a; co:mao_p2 ?x; co:consumes ?asc; co:consumes ?al. } WHERE { GRAPH { ?asc a co:Add_Superclass; co:asc_p1 ?asc1; co:asc_p2 ?asc2. FILTER NOT EXISTS { ?mao co:consumes ?asc. }. FILTER (?asc2 = ). BIND(?asc1 as ?a). OPTIONAL { ?al a co:Add_Label; co:al_p1 ?al1; co:al_p2 ?al2. FILTER NOT EXISTS { ?mao co:consumes ?al. }. FILTER(?al1 = ?asc1). FILTER(regex(str(?al2), 'obsolete_')). } BIND(concat(str(?a), str(?x)) as ?url). filter ('v1'=?v1). filter ('v2'=?v2). BIND(IRI(CONCAT('http://mao/',SHA1(?url))) AS ?mao). } Based on SPARQL queries
13
Representing Changes: Motivation Interesting motivating query – Return all countries for which the unemployment rate of their capital city increased faster than the average increase of the country as a whole, in the last 5 versions Requires – Access to both the changes and the data – Access to multiple versions – Changes are first-class citizens Necessary for preservation
14
D/changes/App1/schema Data D/changes/v1-v2 Representing Changes: Ontology SC 1 Add SuperClass asc_p1 asc_p2 Simple_Change Change Data level Schema level EFO_001927 ObsoleteClass Mark as Obsolete Complex_Change … … … … INSERT … sparql_info
15
Putting it All Together DIACHRON data model contains all versions as well as changes – In a compact form (ontology of changes) Detection based on SPARQL queries – Provided at deployment time (for simple) – Generated at creation time (for complex) Recoverability – Allows moving back and forth between versions (important for preservation, and also for archiving)
16
Summary of Changes Problem – Lots of changes in a single version pair – Look at only a subset of the delta – Need for more intuitive deltas Solution – Pinpoint locations in the ontology where “important” changes happened – Assessment strategies for “change summaries” Number of changes, change of centrality/relevance, importance of position, hybrid strategies
17
D2V Demo D2V tool for: – Creating and managing complex changes – Visualizing the evolution history of a dataset Demonstration video – https://www.youtube.com/watch?v=oY7qBBfcHYg https://www.youtube.com/watch?v=oY7qBBfcHYg – http://www.diachron-fp7.eu/videos.html http://www.diachron-fp7.eu/videos.html Online (live) demo – http://www.diachron-fp7.eu/demos.html http://www.diachron-fp7.eu/demos.html
18
Conclusion Main DIACHRON message – (Linked) data preservation is related to evolution management DIACHRON challenges 1.Diverse data models 2.Dynamic datasets 3.Recoverable versions 4.Changes as first-class citizens 5.Cross-snapshot queries Solutions – DIACHRON data model (#1) – Appropriate change definition and detection (#2, #3) – Changes and data represented at the same level (#4, #5) – Work with high potential (e.g., summaries)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.