Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin.

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin

Introduction Here are investigated Incremental validation algorithm for XML document presented as DTD (Data Type Definition) in O(m log n). Incremental validation algorithm for XML schema in O(m log 2 n). Using the auxiliary structure of size O(n) for both.

Example of an XML document Honda 92 BMW

An XML document as Labeled Ordered Tree Dealer UsedCarsNewCars Ad Model YearModel Year Honda92Subaru99BMW Mazda

Abstraction of Document Type Definitions (DTDs) The basic mechanism for specifying the type of XML documents. root : dealer dealer → UC NC UC → ad* NC → ad* ad → model (year| ε) model → ε year → ε

Specialized DTD abstraction (XML Schema) A specialized DTD is a 4-tuple ‹ ,  t,d,μ› where  is a finite alphabet of labels,  t is a finite alphabet of types, d is a DTD over  and μ is a mapping from  t to .

Specialized DTD (XML Schema) example root : d t d t → UC t NC t μ(d t ) = dealer UC t → ad u * μ(UC t ) = UC NC t → ad n * μ(NC t ) = NC ad u → m t y t μ(ad u ) = ad ad n → m t μ(ad n ) = ad m t → ε μ(m t ) = model y t → ε μ(y t ) = year

Specialized DTD example Dealer UCNC Ad Model YearModel Year dtdt NC t ad u ytyt ytyt mtmt mtmt mtmt mtmt ad n UC t

Incremental Validation Problem Given a specialized DTD , a tree   sat(  ), and a sequence of updates to  yielding another tree  ’, we wish to efficiently check if  ’  sat(  ). Use and maintain the auxiliary structure  (  ) to help in the validation.

Update types Replace the current label of a specified node by another label; Insert a new leaf node after a specified node; Insert a new leaf node as the first child of a specified node; Delete a specified leaf node.

Node label renaming u(a i,b) r … a i-1 aiai a i+1 … c1c1 c2c2 cncn … … …

New node inserting Insert a i r … a i-1 aiai a i+1 … … …

Deleting of a node Delete a i r … a i-1 aiai a i+1 … … …

Warmup: incremental validation of Strings Check the validity of a string a 1 … a n with respect to NFA N = ‹ ,Q,Q 0,F,δ› after a sequence of element renames u(a i1,b 1 )…u(a im,b m ), where i 1 < i 2 <…< i m. Validating the new string from scratch by running it throw N takes O(n |Q 2 | log|Q|)

Incremental validation of Strings (the first attempt) Consider a single renaming u(i,b) for 1≤i≤n. Pre(i)= δ(q 0,a 1 …a i-1 ) Post(i)={s | δ(s,a i+1 …a n )  F } b s2s2 Pre(i) Post(i) S 2  δ(b,s 1 )

Definition of Transition Relation For each I,j 1≤ I < j ≤ n T i,j = {‹p,q› | p,q  Q, q  δ(p, a i …a j ) } δ b = { | r,s  Q, s  δ(r,b)} q a i+1 p aiai ajaj s b r

Checking of validity with Transition Relation The updated string a 1 …a i1-1 b 1 a i1+1 …a im-1 b m a im+1 …a n is valid iff  T o(i1-1) o δ b1 o T (i1+1)(i2-1) o … o T (im+1)(n) Time complexity here is O(m|Q 2 | log |Q|)

Divide-and-conquer validation with Transition Relation Tree Validates a sequence of m renamings to a string of length n. The time taken is O(m|Q| 2 log|Q| log n) The auxiliary structure size is O(|Q| 2 n)

Transition Relation Tree example Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 The number of nodes in T 1n is 2n-1. Its depth is log n.

Label renaming with Transition Relation Tree Consider a 1 …a n  L(n) and a sequence of renames u(i 1,b 1 ), …,u(i m, b m ), where i 1 <i 2 <…<i m. The updated string is a 1 …a i1 b 1 a i+1 …a i,m-1 b m a i,m+1 …a m. The relations T ij which are affected by the updates are those laying on the path from a leaf changed to the root of T n. The number of relations changed is at most mlogn.

Label Renaming by Divide-and- Conquer approach in O(log n) U(a 3,b) Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 b

Dealing with inserts and deletes: Why B-trees? Inserts and deletes cause the position of the nodes in the string to change. The length of the string and the set of relevant intervals used to construct T n are now dynamic. Tree should continue to be balanced and have depth O(log n)

B-trees 3 cells in each node; The cell is either empty or contains a set T s corresponding to some subsequence s of the string. At most one of the 3 cells in a node can be empty. Each nonempty cell is either at a leaf or has one node as a child.

B-Trees for dealing with inserts and deletes in O(log n) T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 T sa = T s1 o T s2 T sb = T s3 o T s5 o T s6

Validation with B-trees with respect to NFA N = ‹ ,Q,Q 0,F,δ › When T for the updated string is computed, check that for some f  F, belongs to the composition of the sets T s in the cells of the root node of T. The cost of checking is O(|Q| 2 log|Q|)

Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4

Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T se,T sf T s1,T s2 T s7,T s8,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4 T s3,T s4 T s5,T s6 T sb’’,T sc T sa,T sb’

B-Tree validation algorithm costs Renaming: update propagates from the leaf to the root – O(log n) updates. Insertion or deletion: may involve splits and merges of the cells all the way to the root. The worst case complexity is O(|Q| 2 log|Q| log n)

Incremental DTD validation d → r(d) root d … a1a1 a i-1 aiai a i+1 anan … c1c1 c2c2 c3c3 c4c4 … … … b v

Incremental DTD validation The auxiliary structure maintained: for each sequence of siblings in the tree the transition relations T s of the divide-and-conquer algorithm are preemptively computed. The auxiliary structure size is at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d} The total validation time is O(m |  | |d| 2 log |d| log |T|)

Specialized DTDs: a first attempt Tree T is valid iff root(d)  types(root(T)) r v a i-1 aiai a i+1 c1c1 c2c2 c3c3 c4c4 cncn … … … types(r) types(v) types(a n ) types(a i ) b

Specialized DTDs: a first attempt The auxiliary structure size is the same as for DTDs, at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d}. The total validation time for DTD is O(m |  | |d| 2 log |d| log |T|). The total validation time for specialized DTD is O(m |  t | |d| 2 log |d| depth(T) log |T|).

Binary tree encoding of unranked tree a b d j k e fh gi a bkj # d c # # #c ## e f # i #h # # # g#

One of the standard encodings in the literature (F.Neven. Automata, Logic and XML. In Computer Science Logic, 2002) Lemma: For each specialized DTD  = ‹ ,  t,d,μ› there exists a BNTA A  over  # whose number of states is O(|  t ||d| ) such that  ( A  ) = {enc(T) | T  sat(  ),

Principle lines a bkj # d# # #c ## e f # i #h # # # g#

From BNTA to NFA on principal lines a bkj # d# # #c ## e f # i #h # # # g# T C, bd, T j T g, f

From BNTA to NFA on principal lines abdefih c k j g T c, bd, T j T g, f

NFA construction We’ll construct NFA N which accepts the string a n …a 1 iff NTFA A = ‹  #,Q,Q 0,q f,δ› accepts enc(T) Let NFA N = ‹  ’,Q,q 0,F’,δ’›, where  ’= {#} υ (Q x  ) υ (  x Q), F’= {q f }, and δ’(#,q 0 ) = Q 0 ; δ’(‹a,S›,q) = υ q’  S δ(a,q,q’) for a   ; δ’(q,‹a,S›) = υ q’  S δ(a,q’,q) for a   ;

Line rearrangement for insertions and deletions v v’ l l0l0 l’ l’’

Complexity Results Given sequence of m updates for DTD XML abstraction we get The auxiliary structure size is at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d} The total validation time is O(m |  | |d| 2 log |d| log |T|)

Complexity Results Given sequence of m updates for specialized DTD (XML schema) we get The auxiliary structure size is at most O(|  | |d| 2 |T|); The total validation time is O(m |  t | 2 |d| 2 log (|  t ||d|) log 2 |T|)

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin.

Similar presentations

Presentation on theme: "Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin.

Similar presentations

Presentation on theme: "Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin."— Presentation transcript:

Similar presentations

About project

Feedback