Download presentation
Presentation is loading. Please wait.
1
Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin
2
Introduction Here are investigated Incremental validation algorithm for XML document presented as DTD (Data Type Definition) in O(m log n). Incremental validation algorithm for XML schema in O(m log 2 n). Using the auxiliary structure of size O(n) for both.
3
Example of an XML document Honda 92 BMW
4
An XML document as Labeled Ordered Tree Dealer UsedCarsNewCars Ad Model YearModel Year Honda92Subaru99BMW Mazda
5
Abstraction of Document Type Definitions (DTDs) The basic mechanism for specifying the type of XML documents. root : dealer dealer → UC NC UC → ad* NC → ad* ad → model (year| ε) model → ε year → ε
6
Specialized DTD abstraction (XML Schema) A specialized DTD is a 4-tuple ‹ , t,d,μ› where is a finite alphabet of labels, t is a finite alphabet of types, d is a DTD over and μ is a mapping from t to .
7
Specialized DTD (XML Schema) example root : d t d t → UC t NC t μ(d t ) = dealer UC t → ad u * μ(UC t ) = UC NC t → ad n * μ(NC t ) = NC ad u → m t y t μ(ad u ) = ad ad n → m t μ(ad n ) = ad m t → ε μ(m t ) = model y t → ε μ(y t ) = year
8
Specialized DTD example Dealer UCNC Ad Model YearModel Year dtdt NC t ad u ytyt ytyt mtmt mtmt mtmt mtmt ad n UC t
9
Incremental Validation Problem Given a specialized DTD , a tree sat( ), and a sequence of updates to yielding another tree ’, we wish to efficiently check if ’ sat( ). Use and maintain the auxiliary structure ( ) to help in the validation.
10
Update types Replace the current label of a specified node by another label; Insert a new leaf node after a specified node; Insert a new leaf node as the first child of a specified node; Delete a specified leaf node.
11
Node label renaming u(a i,b) r … a i-1 aiai a i+1 … c1c1 c2c2 cncn … … …
12
New node inserting Insert a i r … a i-1 aiai a i+1 … … …
13
Deleting of a node Delete a i r … a i-1 aiai a i+1 … … …
14
Warmup: incremental validation of Strings Check the validity of a string a 1 … a n with respect to NFA N = ‹ ,Q,Q 0,F,δ› after a sequence of element renames u(a i1,b 1 )…u(a im,b m ), where i 1 < i 2 <…< i m. Validating the new string from scratch by running it throw N takes O(n |Q 2 | log|Q|)
15
Incremental validation of Strings (the first attempt) Consider a single renaming u(i,b) for 1≤i≤n. Pre(i)= δ(q 0,a 1 …a i-1 ) Post(i)={s | δ(s,a i+1 …a n ) F } b s2s2 Pre(i) Post(i) S 2 δ(b,s 1 )
16
Definition of Transition Relation For each I,j 1≤ I < j ≤ n T i,j = {‹p,q› | p,q Q, q δ(p, a i …a j ) } δ b = { | r,s Q, s δ(r,b)} q a i+1 p aiai ajaj s b r
17
Checking of validity with Transition Relation The updated string a 1 …a i1-1 b 1 a i1+1 …a im-1 b m a im+1 …a n is valid iff T o(i1-1) o δ b1 o T (i1+1)(i2-1) o … o T (im+1)(n) Time complexity here is O(m|Q 2 | log |Q|)
18
Divide-and-conquer validation with Transition Relation Tree Validates a sequence of m renamings to a string of length n. The time taken is O(m|Q| 2 log|Q| log n) The auxiliary structure size is O(|Q| 2 n)
19
Transition Relation Tree example Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 The number of nodes in T 1n is 2n-1. Its depth is log n.
20
Label renaming with Transition Relation Tree Consider a 1 …a n L(n) and a sequence of renames u(i 1,b 1 ), …,u(i m, b m ), where i 1 <i 2 <…<i m. The updated string is a 1 …a i1 b 1 a i+1 …a i,m-1 b m a i,m+1 …a m. The relations T ij which are affected by the updates are those laying on the path from a leaf changed to the root of T n. The number of relations changed is at most mlogn.
21
Label Renaming by Divide-and- Conquer approach in O(log n) U(a 3,b) Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 b
22
Dealing with inserts and deletes: Why B-trees? Inserts and deletes cause the position of the nodes in the string to change. The length of the string and the set of relevant intervals used to construct T n are now dynamic. Tree should continue to be balanced and have depth O(log n)
23
B-trees 3 cells in each node; The cell is either empty or contains a set T s corresponding to some subsequence s of the string. At most one of the 3 cells in a node can be empty. Each nonempty cell is either at a leaf or has one node as a child.
24
B-Trees for dealing with inserts and deletes in O(log n) T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 T sa = T s1 o T s2 T sb = T s3 o T s5 o T s6
25
Validation with B-trees with respect to NFA N = ‹ ,Q,Q 0,F,δ › When T for the updated string is computed, check that for some f F, belongs to the composition of the sets T s in the cells of the root node of T. The cost of checking is O(|Q| 2 log|Q|)
26
Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4
27
Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T se,T sf T s1,T s2 T s7,T s8,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4 T s3,T s4 T s5,T s6 T sb’’,T sc T sa,T sb’
28
B-Tree validation algorithm costs Renaming: update propagates from the leaf to the root – O(log n) updates. Insertion or deletion: may involve splits and merges of the cells all the way to the root. The worst case complexity is O(|Q| 2 log|Q| log n)
29
Incremental DTD validation d → r(d) root d … a1a1 a i-1 aiai a i+1 anan … c1c1 c2c2 c3c3 c4c4 … … … b v
30
Incremental DTD validation The auxiliary structure maintained: for each sequence of siblings in the tree the transition relations T s of the divide-and-conquer algorithm are preemptively computed. The auxiliary structure size is at most O(| | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a d} The total validation time is O(m | | |d| 2 log |d| log |T|)
31
Specialized DTDs: a first attempt Tree T is valid iff root(d) types(root(T)) r v a i-1 aiai a i+1 c1c1 c2c2 c3c3 c4c4 cncn … … … types(r) types(v) types(a n ) types(a i ) b
32
Specialized DTDs: a first attempt The auxiliary structure size is the same as for DTDs, at most O(| | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a d}. The total validation time for DTD is O(m | | |d| 2 log |d| log |T|). The total validation time for specialized DTD is O(m | t | |d| 2 log |d| depth(T) log |T|).
33
Binary tree encoding of unranked tree a b d j k e fh gi a bkj # d c # # #c ## e f # i #h # # # g#
34
One of the standard encodings in the literature (F.Neven. Automata, Logic and XML. In Computer Science Logic, 2002) Lemma: For each specialized DTD = ‹ , t,d,μ› there exists a BNTA A over # whose number of states is O(| t ||d| ) such that ( A ) = {enc(T) | T sat( ),
35
Principle lines a bkj # d# # #c ## e f # i #h # # # g#
36
From BNTA to NFA on principal lines a bkj # d# # #c ## e f # i #h # # # g# T C, bd, T j T g, f
37
From BNTA to NFA on principal lines abdefih c k j g T c, bd, T j T g, f
38
NFA construction We’ll construct NFA N which accepts the string a n …a 1 iff NTFA A = ‹ #,Q,Q 0,q f,δ› accepts enc(T) Let NFA N = ‹ ’,Q,q 0,F’,δ’›, where ’= {#} υ (Q x ) υ ( x Q), F’= {q f }, and δ’(#,q 0 ) = Q 0 ; δ’(‹a,S›,q) = υ q’ S δ(a,q,q’) for a ; δ’(q,‹a,S›) = υ q’ S δ(a,q’,q) for a ;
39
Line rearrangement for insertions and deletions v v’ l l0l0 l’ l’’
40
Complexity Results Given sequence of m updates for DTD XML abstraction we get The auxiliary structure size is at most O(| | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a d} The total validation time is O(m | | |d| 2 log |d| log |T|)
41
Complexity Results Given sequence of m updates for specialized DTD (XML schema) we get The auxiliary structure size is at most O(| | |d| 2 |T|); The total validation time is O(m | t | 2 |d| 2 log (| t ||d|) log 2 |T|)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.