Presentation is loading. Please wait.

Presentation is loading. Please wait.

Version Management for XML Documents Copy-Based vs Edit-Based Schemes Shu-Yao Chien Computer Science Department University of California, Los Angeles

Similar presentations


Presentation on theme: "Version Management for XML Documents Copy-Based vs Edit-Based Schemes Shu-Yao Chien Computer Science Department University of California, Los Angeles"— Presentation transcript:

1 Version Management for XML Documents Copy-Based vs Edit-Based Schemes Shu-Yao Chien Computer Science Department University of California, Los Angeles csy@cs.ucla.edu Vassilis J. Tsotras Department of Computer Science and Engineering University of California, Riverside tsotras@cs.ucr.edu Carlo Zaniolo Computer Science Department University of California, Los Angeles zaniolo@cs.ucla.edu

2 The Problem Managing (storing, querying) multiple versions documents is important for content providers and cooperative work Temporal DBs: transaction time, CAD/OO applications Web/XML changes/unifies everything Traditional schemes (RCS, SCCS): not optimized for secondary store---no temporal clustering DB-oriented approaches: not optimized for retrieval of complete documents Transport level: exchange and processing (browser side) of multiversion documents also critical—need to reconcile storage and exchange representations.

3 Version Management: Approaches Time stamping of objects Store all Snapshots: fast retrieval, excessive storage Edit-Based Schemes store the Deltas. Minimal storage but slow retrieval. Traditionally line-oriented DIFF, but semistructured objects in Lorel Our Scheme: Usefulness Based Copy Control (UBCC) - Separate edit scripts from the objects. - Temporal Clustering of objects using page usefulness.

4 Example: an Evolving XML Document VERSION 1... … … VERSION 2 … … … Order 1 2 3 4 5 6 7 8 Order 1 2 3 4 5 6 7 8 9

5 Temporal Clustering by Page Usefulness Usefulness: percentage of page occupied by objects from the current version—the rest is occupied by ‘dead’ objects from previous versions We set a minimum usefulness requirement e.g. 50% When the usefulness of a page fall below this minimum we copy its live objects to a new page

6 Maintaining Page Usefulness above 70% by Copying Alive Objects O1O1 O2O2 O3O3 O4O4 O5O5 O6O6 O7O7 O8O8 VERSION 1 P1 VERSION 2 DEL,U(P1) =75%P2,U(P2) = 50% < U min =70% P3 Copied O5O5 O6O6 O9O9 O 10,U(P3) = 100%

7 Usefulness Based Copy Control (UBCC) rootch Asec Dsec Ech Bsec Fsec Gsec H VERSION 2 INS(sec J) DEL INS(sec G’) DEL INS(ch K), INS(sec L) STEP 1 : Determine page usefulness for copying., U(P1) = 75% VERSION 1, U(P2) = 50% < U min =70% STEP 2 : Append new/copied objects into new pages by their logical order. P3 sec J COPY ch Bsec Fsec G’ P4 ch Ksec L P1P2, U(P3)=100%, U(P4)=100%

8 Document Object Order sec A 2 sec E 4 ch Bsec Fsec Gsec H ch B 5 sec F 6 P3 sec J 3 sec G’ 7 sec L 9 P4 ch K 8 P1 P2 sec D Version 2 objects are not stored in sequence : Hence, we use the edit script. VERSION 2 = (root 1, sec A 2, sec J 3, sec E 4, ch B 5, sec F 6, sec G’ 7,ch K 8,sec L 9 ) root 1

9 Beyond Edit-Based Versioning The UBCC schemes achieves good storage and retrieval efficiency. But it is not suitable at the transport level and for query on content Thus, we propose a copy-based model which : –explores shared elements –needs no edit script –Yields a simple XML representation for the document history

10 The XML Version Model (XVM) XVM is a list of version nodes Each version node is an ordered tree consisting of four types of nodes : –element node –attribute node –text node –copy record node Minimal extensions to the Xpath data model—the copy record node is actually a link.

11 Copy-Based XML Version Model (XVM) V E T A C Version nodeElement node Text node Attribute node copy record node V EE E AA A TT T V E E A A T T C C Tree Addr Ref : V1.2.1

12 XVM --- Example V E chapter “Intro” E chapter “Tutorial” E section “Scope” E section “Concepts” E section “Context” V1 Changes : 1. DELETE chapter “Tutorial” 2. INSERT chapter “Second Ex” C V E chapter “Second Ex” V2 V1.1 E section “Test Data” Changes : 1. UPDATE the textual content of chapter “Second Ex” 2. COPY the “Concepts” section and insert after section “Test data”. E chapter “Intro” E section “Scope” E section “Concepts” C V E chapter “Second Ex” V3 C C V2.1 V2.2.1 V2.1.2

13 XVM Version Retrieval --- Example V E C chapter “Intro” E chapter “Tutorial” E section “Scope” E section “Concepts” E section “Context” V1 V E chapter “Second Ex” E section “Test Data” V2 E chapter “Intro” E section “Scope” E section “Concepts” C V E chapter “Second Ex” V3 C C V2.1 V2.2.1 V2.1.2 V1.1

14 XVM Benefits Transport Level: Represent XVM as an XML document—its DTD automatically generated from the document DTD Storage Level: we extended the usefulness-based temporal clustering scheme to XVM

15 XVM Implementation --- Use XML to Represent XVM DTD Transformation : –Define three new elements :, and. –For each element in the original DTD add to its content model a CopyRecord as an alternate. Example : Original DTD... Version DTD...

16 Performance and Storage Cost

17 Conclusion UBCC is efficient at the storage level. The copy-based scheme is effective as a storage representation and a transport representation Our current research focuses on efficient evaluation of queries on versions: –content queries, –snapshot queries, –history queries.


Download ppt "Version Management for XML Documents Copy-Based vs Edit-Based Schemes Shu-Yao Chien Computer Science Department University of California, Los Angeles"

Similar presentations


Ads by Google