Download presentation
Presentation is loading. Please wait.
Published byNickolas Grant Modified over 9 years ago
1
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras
2
What is the problem? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 1 Most research on database content Most research on database content Usually overwrite existing state Usually overwrite existing state Need of research on database history Need of research on database history Lost scientific evidence Lost scientific evidence No verification of findings basis No verification of findings basis
3
Why is this interesting? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 2 History of the data History of the data Scientific research Scientific research SWISS-PROT (protein sequence) SWISS-PROT (protein sequence) OMIM (human genes and genetic disorders) OMIM (human genes and genetic disorders) Great deal of manual labour Great deal of manual labour Continuous changes Continuous changes Access to old versions Access to old versions
4
First Approach ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 3 Object matching across versions Object matching across versions Changes descriptions Changes descriptions Archive space Archive space History efficient queries History efficient queries
5
Proposed technique (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 4 Based on: Hierarchical data Hierarchical data Key structured databases Key structured databases Accretive databases Accretive databases
6
Proposed technique (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 5 Merging versions into one hierarchy Merging versions into one hierarchy Elements stored once Elements stored once Timestamps Timestamps Sequence of versions Sequence of versions Time intervals Time intervals Inheritance Inheritance Keys for element identification Keys for element identification
7
Example ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 6
8
XML Model (1/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 7 Nodes values Nodes values T-node: data values T-node: data values A-node: attribute name, attribute value A-node: attribute name, attribute value E-node (internal nodes): tag name E-node (internal nodes): tag name List of values of E and T children Set of values of A children Nodes value equality Nodes value equality Agree on their value Agree on their value Path expression Path expression Sequence of node names Sequence of node names
9
XML Model (2/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 8 Key Key Pair of path expressions (Q, {P 1,…,P k }) Pair of path expressions (Q, {P 1,…,P k }) Q: target set of nodes {P 1,…,P k }: Q key constraints Relative key Relative key Description dependent on ancestor node key Description dependent on ancestor node key Weak entities Weak entities
10
XML Model (3/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 9 Keys for previous example Keys for previous example (/,(db,{})) (/,(db,{})) At most one db element at the root (/db,(address,{})) (/db,(address,{})) At most one address under db node (/db,(emp,{id})) (/db,(emp,{id})) Every employee within a db element can be uniquely identified by his id subelement (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) There can be at most one name, sal and tel node for each employee
11
ArchiveArchive Components (1/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 10 Annotate Keys Nested Merge Archiver Archiver components overview Archiver components overview Annotate Keys, Timestamps Timestamps KeysKeys NewversionNewversion New Archive
12
Components (2/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 11 Annotate keys Annotate keys Elements annotation with key values Elements annotation with key values Uniquely identified nodes Uniquely identified nodes Path from root to node Key annotation
13
Components (3/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 12 Nested merge Nested merge Identify corresponding elements Identify corresponding elements Merge elements Merge elements Update sets of timestamps Update sets of timestamps Nodes with no corresponding Nodes with no corresponding Simply added
14
Components (4/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 13
15
Experimental Results (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 14 Competitive techniques Competitive techniques Incremental diff Incremental diff Cumulative diff Cumulative diff Compression methods Compression methods Gzip (text) Gzip (text) Xmill (XML) Xmill (XML)
16
Experimental Results (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 15
17
Efficient Retrievals (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 16 Version retrieval Version retrieval Binary tree for each node x with children as leaves Binary tree for each node x with children as leaves TimestampTimestamp Archive offsetArchive offset
18
Efficient Retrievals (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 17 Temporal history retrieval Temporal history retrieval Find keyed node x Find keyed node x Set of keyed children Set of keyed children Archive offset, timestamp offset Archive offset, timestamp offset Sort list Sort list Repeat for each keyed node Repeat for each keyed node
19
Conclusion ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 18 Efficient archiving technique Efficient archiving technique Meaningful change descriptions Meaningful change descriptions Space overhead comparable to diff approach Space overhead comparable to diff approach OMIM archive for a year OMIM archive for a year Less than 1.12 times the space of last version Less than 1.08 times the size of incremental-diff 40% compression with XML compression tool Works well with XML compression Works well with XML compression Basic operations with single pass Basic operations with single pass XML output (further use) XML output (further use)
20
Xarch (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 19 Archiving tool Archiving tool Extends archiving technique Extends archiving technique Sort elements by key Sort elements by key External merge sort Query language Query language Versions retrieval History tracking
21
Xarch (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 20 Query language example Query language example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.