Structure and Value Synopses for XML Data Graphs

Slides:

Advertisements

Similar presentations

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Advertisements

A Privacy Preserving Index for Range Queries

Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.

Fast Algorithms For Hierarchical Range Histogram Constructions

STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.

Bloom Based Filters for Hierarchical Data Georgia Koloniari and Evaggelia Pitoura University of Ioannina, Greece.

Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.

Estimating the Selectivity of XML Path Expressions for Internet Scale Applications Ashraf Aboulnaga Alaa R. Alameldeen Jeffrey F. Naughton Computer Sciences.

Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.

Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.

Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,

Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.

Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)

Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.

Chapter 3: Data Storage and Access Methods

Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.

Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,

Liang Jin and Chen Li VLDB’2005 Supported by NSF CAREER Award IIS Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.

Dependency-Based Histogram Synopses for High-dimensional Data Amol Deshpande, UC Berkeley Minos Garofalakis, Bell Labs Rajeev Rastogi, Bell Labs.

1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.

Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.

Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)

Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.

1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.

XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey.

EN : Adv. Storage and TP Systems Cost-Based Query Optimization.

Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze.

Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.

Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.

Fateme Shirazi Spring Statistical structures for Internet-scale data management Authors: Nikos Ntarmos, Peter Triantafillou, G. Weikum.

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

A Novel Approach for Approximate Aggregations Over Arrays SSDBM 2015 June 29 th, San Diego, California 1 Yi Wang, Yu Su, Gagan Agrawal The Ohio State University.

Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.

Histograms for Selectivity Estimation

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.

SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Accuracy Guarantee Ho Wai Shing.

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,

Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.

XCluster Synopses for Structured XML Content Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Intel Research, Berkeley)

APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS

A Simulation-Based Study of Overlay Routing Performance CS 268 Course Project Andrey Ermolinskiy, Hovig Bayandorian, Daniel Chen.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.

Confidence Intervals about a Population Proportion

By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01

Tree-Pattern Aggregation for Scalable XML Data Dissemination

Updating SF-Tree Speaker: Ho Wai Shing.

A paper on Join Synopses for Approximate Query Answering

Efficient Filtering of XML Documents with XPath Expressions

RE-Tree: An Efficient Index Structure for Regular Expressions

Probabilistic Data Management

Query-Friendly Compression of Graph Streams

Communication and Memory Efficient Parallel Decision Tree Construction

Shape Segmentation and Applications in Sensor Networks

Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.

What to do when you don’t know anything know nothing

XML indexing – A(k) indices

Incremental Maintenance of XML Structural Indexes

DATABASE HISTOGRAMS E0 261 Jayant Haritsa

Tree-Pattern Similarity Estimation for Scalable Content-based Routing

Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Structure and Value Synopses for XML Data Graphs Neoklis Polyzotis (UW-Madison) Minos Garofalakis (Bell Labs)

Path Expressions //author[book/year>2000]/paper r author author title year title title 1999 2002

Path Expressions //author[book/year>2000]/paper r author author title year title title 1999 2002

Path Expressions //author[book/year>2000]/paper r author author title year title title 1999 2002

Path Expressions //author[book/year>2000]/paper r author author title year title title 1999 2002

Path Expressions Efficient evaluationPath Selectivity //author[book/year>2000]/paper r author author book paper book paper paper year title year title title 1999 2002 Efficient evaluationPath Selectivity Need to estimate true selectivities

Contribution XSKETCH synopses Structure + Value synopses Graph structured XML Data Branching PEs with value predicates Low estimation error/Low storage

Outline Preliminaries XSKETCH Synopses Construction Experimental Results Conclusions

XML Data Model XML Document  XML Data Graph ρ0 A1 A2 PB3 A: Author, PB: Publisher B: Book, N:Name, P:Paper N4 B5 P6 P7 N8 B9 V4 V8 T: Title, E:Editor T10 T11 T12 T13 E14 V10 V11 V12 V13 V14 XML Document  XML Data Graph Graph nodes  Elements+Attributes+Values Graph edges  Nesting + Reference (ID/IDREF)

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Estimation Problem Size of result set for a PE Challenges: Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]

Estimation Problem Size of result set for a PE Challenges: Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]

Estimation Problem Size of result set for a PE Challenges: Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]

Estimation Problem Size of result set for a PE Challenges: Structural Correlations Example: paper/author book/author Value to Value Correlations Example: author[name=v1]/paper[title=v2] Path to Value Correlations Example: paper/author[=v] book/author[=v]

Outline Preliminaries XSKETCH Synopses Construction Synopses Model Estimation Construction Experimental Results Conclusions

Synopsis Model Graph Synopsis Edge Stability Information Value Summaries

Graph Synopsis Set of elements (same tag)  Summary Node Document Graph Synopsis ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 Set of elements (same tag)  Summary Node Document Edge  Summary Edge

Backward Edge Stability Document Graph Synopsis ρ0 ρ(1) b b A1 A2 PB3 A(2) PB(1) b b N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b b b V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 ubv: all elements in v have a parent in u

Forward Edge Stability Document Graph Synopsis ρ0 ρ(1) f f A1 A2 PB3 A(2) PB(1) f f f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) f f V4 V8 T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 ufv: all elements in u have a child in v

Value Summaries Summarize values “under” synopsis nodes Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Summarize values “under” synopsis nodes Implementation dependent on values E.g., histograms, pruned suffix trees,…

Path to Value Correlations Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE One histogram per summary node

Value to Value Correlations Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Multi-dimensional histograms Correlations within stable neighborhood

Value to Value Correlations Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE,T Multi-dimensional histograms Correlations within stable neighborhood

Value to Value Correlations Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE Multi-dimensional histograms Correlations within stable neighborhood

Outline Preliminaries XSKETCH Synopses Construction Synopses Model Estimation Construction Experimental Results Conclusions

Estimation ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) P(2) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE

B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2]) Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2])

B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2]) Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2])

B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2]) Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2]  2 x f(B[E=v1]/T[=v2])

B[E=v1]/T[=v2]  2 x f(B/T[=v2]) x f(E=v1|B) x f(B[E]) Estimation Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE B[E=v1]/T[=v2]  2 x f(B/T[=v2]) x f(E=v1|B) x f(B[E])

Estimation Model Break path in stable sub-paths Derive correlation scopes Apply statistical assumptions Independence Uniformity

Outline Preliminaries XSKETCH Synopses Construction Experimental Results Conclusions

Coarsest Synopsis ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) Document Graph Coarsest Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) f f b V4 V8 VN T10 T11 T12 T13 E14 T(4) E(1) V10 V11 V12 V13 V14 VT VE All document paths, but also false paths High estimation error Small size

Perfect Synopsis ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 Document Graph Perfect Synopsis ρ0 ρ(0) b/f b/f b/f A1 A2 PB3 A(1) A(1) PB(1) b/f b/f b/f b/f b/f b/f b/f N4 B5 P6 P7 N8 B9 N(1) B(1) P(1) P(1) N(1) B(1) b/f b/f b/f b/f b/f b/f V4 V8 VN VN T10 T11 T12 T13 E14 T(1) T(1) T(1) T(1) E(1) V10 V11 V12 V13 V14 VT VT VT VT VE All document paths and no false paths Zero estimation error Large size

Construction Optimal XSKETCH: NP-Hard Forward Selection Algorithm Refinements Successively refine coarsest summary Selection criterion: marginal gains

Construction Step … … r … …

Construction Step … Path Sample P … r … E=error(P) E’=error(P) … S=size() S’=size()

Construction Step … Path Sample P … r … E=error(P) E’=error(P) … S=size() S’=size() gain(r)=(E-E’)/(S’-S)

Refinements Structural Refinements Value Refinements backward-stabilize forward-stabilize backward-split Value Refinements value-expand value-remove value-refine

Outline Preliminaries XSKETCH Synopses Construction Experimental Results Conclusions

Implementation Single-dimensional histograms Integer values Strings hashed to integers Construction: max-diff(V,A)

Datasets Elements Coarsest Summary (KB) Perfect Summary (MB) IMDB 102,755 7.8 1.9 XMark 87,480 4.1 3.3

Workload 1000 Positive Pes Similar results with negative PEs Biased random sample from document Path Length: 2-5 500 contain range predicates Predicates: random, 10% of value domain Similar results with negative PEs

Accuracy Metric Average Absolute Relative Error

Results – IMDB (Branching) Branches: 0-2 Avg. Result Count: 478(Predicates)/1901(No Predicates) (2%)

Results – IMDB (Simple) Branches: 0 Avg. Result Count: 483(Predicates)/933(No Predicates)

Conclusions Path selectivity estimation is important XSKETCH synopses Branching PEs with predicates Graph-structured data Model: graph synopsis+stability+value summaries Efficient forward selection algorithm Experimental Results Accurate synopses with small space requirements Effective construction algorithm

Overflow Slides

Path Expressions XPath expressions Result is a set Simple: A/P/T Branching: A[B]/P/T Values: A[N=v8]/P/T A/P/T[=v11] Result is a set ρ0 A1 A2 PB3 N4 B5 P6 P7 N8 B9 V4 V8 T10 T11 T12 T13 E14 V10 V11 V12 V13 V14

Estimation (a) ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 N(2) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/P/T[=v]  2 x f(T=v|A/P/T)

A/B/T[=v]  2 x f(T=v|B/T) x f(A/B | B/T[=v]) Estimation (c) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/B/T[=v]  2 x f(T=v|B/T) x f(A/B | B/T[=v])

A/B/T[=v]  2 x f(T=v|B/T) x f(A/B) Estimation (c) Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 V14 VT VT VE A/B/T[=v]  2 x f(T=v|B/T) x f(A/B)

Value-expand example ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 VT VT A[N=v1]/P/T[=v2]  2 x f(T=v2) x f(N=v1)

Value-expand example ρ0 ρ(1) A1 A2 PB3 A(2) PB(1) N4 B5 P6 P7 N8 B9 Document Graph Synopsis ρ0 ρ(1) b/f b/f A1 A2 PB3 A(2) PB(1) b/f f b/f N4 B5 P6 P7 N8 B9 N(2) P(2) B(2) b/f b/f b V4 V8 VN T10 T11 T12 T13 E14 T(2) T(2) E(1) V10 V11 V12 V13 VT,N VT A[N=v1]/P/T[=v2]  2 x f(T=v2 and N=v1)

Results – XMark (Branching) Path length: 2-5 Branches: 0-2 Avg. Result Count: 254(Predicates)/1057(No Predicates)

Results – XMark (Simple) Path length: 2-5 Branches: 0 Avg. Result Count: 302(Predicates)/771(No Predicates)

} Synopsis Model Graph Synopsis Stability Information Value Distribution Information XSKETCH structural synopsis

XSKETCH Structural Synopses Previous Work Branching PEs Graph structured XML data Low estimation error/Small size Values? Path to value correlations Value to value correlations