Download presentation
Presentation is loading. Please wait.
1
Structure and Value Synopses for XML Data Graphs
Neoklis Polyzotis (UW-Madison) Minos Garofalakis (Bell Labs)
2
Path Expressions //author[book/year>2000]/paper r author author
title year title title 1999 2002
3
Path Expressions //author[book/year>2000]/paper r author author
title year title title 1999 2002
4
Path Expressions //author[book/year>2000]/paper r author author
title year title title 1999 2002
5
Path Expressions //author[book/year>2000]/paper r author author
title year title title 1999 2002
6
Path Expressions Efficient evaluationPath Selectivity
//author[book/year>2000]/paper r author author book paper book paper paper year title year title title 1999 2002 Efficient evaluationPath Selectivity Need to estimate true selectivities
7
Contribution XSKETCH synopses Structure + Value synopses
Graph structured XML Data Branching PEs with value predicates Low estimation error/Low storage
8
Outline Preliminaries XSKETCH Synopses Construction
Experimental Results Conclusions
9
XML Data Model XML Document XML Data Graph ρ0 A1 A2 PB3
A: Author, PB: Publisher B: Book, N:Name, P:Paper N4 B5 P6 P7 N8 B9 V4 V8 T: Title, E:Editor T10 T11 T12 T13 E14 V10 V11 V12 V13 V14 XML Document XML Data Graph Graph nodes Elements+Attributes+Values Graph edges Nesting + Reference (ID/IDREF)
10
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
11
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
12
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
13
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
14
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
15
Estimation Problem Size of result set for a PE Challenges:
Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]
16
Estimation Problem Size of result set for a PE Challenges:
Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]
17
Estimation Problem Size of result set for a PE Challenges:
Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]
18
Estimation Problem Size of result set for a PE Challenges:
Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]
19
Outline Preliminaries XSKETCH Synopses Construction
Synopses Model Estimation Construction Experimental Results Conclusions
20
Synopsis Model Graph Synopsis Edge Stability Information
Value Summaries
21
Graph Synopsis Set of elements (same tag) Summary Node
Document Graph Synopsis ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 Set of elements (same tag) Summary Node Document Edge Summary Edge
22
Backward Edge Stability
Document Graph Synopsis ρ0 ρ(1) b b A1 A2 PB3 A(2) PB(1) b b N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b b b V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 ubv: all elements in v have a parent in u
23
Forward Edge Stability
Document Graph Synopsis ρ0 ρ(1) f f A1 A2 PB3 A(2) PB(1) f f f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) f f V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 ufv: all elements in u have a child in v
24
Value Summaries Summarize values “under” synopsis nodes
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Summarize values “under” synopsis nodes Implementation dependent on values E.g., histograms, pruned suffix trees,…
25
Path to Value Correlations
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE One histogram per summary node
26
Value to Value Correlations
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Multi-dimensional histograms Correlations within stable neighborhood
27
Value to Value Correlations
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE,T Multi-dimensional histograms Correlations within stable neighborhood
28
Value to Value Correlations
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Multi-dimensional histograms Correlations within stable neighborhood
29
Outline Preliminaries XSKETCH Synopses Construction
Synopses Model Estimation Construction Experimental Results Conclusions
30
Estimation ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) P(2)
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE
31
B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
32
B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
33
B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2] 2 x f(B[E=v1]/T[=v2])
34
B[E=v1]/T[=v2] 2 x f(B/T[=v2]) x f(E=v1|B) x f(B[E])
Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2] 2 x f(B/T[=v2]) x f(E=v1|B) x f(B[E])
35
Estimation Model Break path in stable sub-paths
Derive correlation scopes Apply statistical assumptions Independence Uniformity
36
Outline Preliminaries XSKETCH Synopses Construction
Experimental Results Conclusions
37
Coarsest Synopsis ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2)
Document Graph Coarsest Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) f f b V4 V8 VN T10 T11 T12 T13 E14 T(4) E(1) V10 V11 V12 V13 V14 VT VE All document paths, but also false paths High estimation error Small size
38
Perfect Synopsis ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13
Document Graph Perfect Synopsis ρ0 ρ(0) b/f b/f b/f A1 A2 PB3 A(1) A(1) PB(1) b/f b/f b/f b/f b/f b/f b/f N4 B5 P6 P7 N8 B9 N(1) B(1) P(1) P(1) N(1) B(1) b/f b/f b/f b/f b/f b/f V4 V8 VN VN T10 T11 T12 T13 E14 T(1) T(1) T(1) T(1) E(1) V10 V11 V12 V13 V14 VT VT VT VT VE All document paths and no false paths Zero estimation error Large size
39
Construction Optimal XSKETCH: NP-Hard Forward Selection Algorithm
Refinements Successively refine coarsest summary Selection criterion: marginal gains
40
Construction Step … … r … …
41
Construction Step … Path Sample P … r … E=error(P) E’=error(P) …
S=size() S’=size()
42
Construction Step … Path Sample P … r … E=error(P) E’=error(P) …
S=size() S’=size() gain(r)=(E-E’)/(S’-S)
43
Refinements Structural Refinements Value Refinements
backward-stabilize forward-stabilize backward-split Value Refinements value-expand value-remove value-refine
44
Outline Preliminaries XSKETCH Synopses Construction
Experimental Results Conclusions
45
Implementation Single-dimensional histograms Integer values
Strings hashed to integers Construction: max-diff(V,A)
46
Datasets Elements Coarsest Summary (KB) Perfect Summary (MB) IMDB
102,755 7.8 1.9 XMark 87,480 4.1 3.3
47
Workload 1000 Positive Pes Similar results with negative PEs
Biased random sample from document Path Length: 2-5 500 contain range predicates Predicates: random, 10% of value domain Similar results with negative PEs
48
Accuracy Metric Average Absolute Relative Error
49
Results – IMDB (Branching)
Branches: 0-2 Avg. Result Count: 478(Predicates)/1901(No Predicates) (2%)
50
Results – IMDB (Simple)
Branches: 0 Avg. Result Count: 483(Predicates)/933(No Predicates)
51
Conclusions Path selectivity estimation is important XSKETCH synopses
Branching PEs with predicates Graph-structured data Model: graph synopsis+stability+value summaries Efficient forward selection algorithm Experimental Results Accurate synopses with small space requirements Effective construction algorithm
52
Overflow Slides
53
Path Expressions XPath expressions Result is a set Simple: A/P/T
Branching: A[B]/P/T Values: A[N=v8]/P/T A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14
54
Estimation (a) ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2)
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/P/T[=v] 2 x f(T=v|A/P/T)
55
A/B/T[=v] 2 x f(T=v|B/T) x f(A/B | B/T[=v])
Estimation (c) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/B/T[=v] 2 x f(T=v|B/T) x f(A/B | B/T[=v])
56
A/B/T[=v] 2 x f(T=v|B/T) x f(A/B)
Estimation (c) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/B/T[=v] 2 x f(T=v|B/T) x f(A/B)
57
Value-expand example ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 VT VT A[N=v1]/P/T[=v2] 2 x f(T=v2) x f(N=v1)
58
Value-expand example ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9
Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 VT,N VT A[N=v1]/P/T[=v2] 2 x f(T=v2 and N=v1)
59
Results – XMark (Branching)
Path length: Branches: 0-2 Avg. Result Count: 254(Predicates)/1057(No Predicates)
60
Results – XMark (Simple)
Path length: Branches: 0 Avg. Result Count: 302(Predicates)/771(No Predicates)
61
} Synopsis Model Graph Synopsis Stability Information
Value Distribution Information XSKETCH structural synopsis
62
XSKETCH Structural Synopses
Previous Work Branching PEs Graph structured XML data Low estimation error/Small size Values? Path to value correlations Value to value correlations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.