The Complexity of XPath Evaluation Paper By: Georg Gottlob Cristoph Koch Reinhard Pichler Presented By: Royi Ronen.

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
1 Nondeterministic Space is Closed Under Complement Presented by Jing Zhang and Yingbo Wang Theory of Computation II Professor: Geoffrey Smith.
CS 461 – Nov. 9 Chomsky hierarchy of language classes –Review –Let’s find a language outside the TM world! –Hints: languages and TM are countable, but.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages We need a device similar to an FSM except that it needs more power. The insight: Precisely.
Managing Data Exchange: XPath
Determinization of Büchi Automata
NL equals coNL Section 8.6 Giorgi Japaridze Theory of Computability.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
Complexity 11-1 Complexity Andrei Bulatov Space Complexity.
Advanced Topics in Algorithms and Data Structures
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
Computability and Complexity 19-1 Computability and Complexity Andrei Bulatov Non-Deterministic Space.
Tractable and intractable problems for parallel computers
Fall 2004COMP 3351 Recursively Enumerable and Recursive Languages.
Recursively Enumerable and Recursive Languages
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Uncountable Sets continued Theorem: Let be an infinite countable set. The powerset of is uncountable.
Computability and Complexity 32-1 Computability and Complexity Andrei Bulatov Boolean Circuits.
Submitted by : Estrella Eisenberg Yair Kaufman Ohad Lipsky Riva Gonen Shalom.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Lecture 12. Default Processing in XSLT The default processing in XSLT is to process the XPath root node The default processing for various node types.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
Prof. Busch - LSU1 Reductions. Prof. Busch - LSU2 Problem is reduced to problem If we can solve problem then we can solve problem.
ROM-based computations: quantum versus classical B.C. Travaglione, M.A.Nielsen, H.M. Wiseman, and A. Ambainis.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1/17 ITApplications XML Module Session 7: Introduction to XPath.
Introduction to XPath Web Engineering, SS 2007 Tomáš Pitner.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
NP Complexity By Mussie Araya. What is NP Complexity? Formal Definition: NP is the set of decision problems solvable in polynomial time by a non- deterministic.
CSCI 2670 Introduction to Theory of Computing November 29, 2005.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.
Parallel computation Section 10.5 Giorgi Japaridze Theory of Computability.
1 XML Data Management XPath Principles Werner Nutt.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
University of Nottingham School of Computer Science & Information Technology Introduction to XML 2. XSLT Tim Brailsford.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Martin Kruliš by Martin Kruliš (v1.1)1.
CSE3201/CSE4500 XPath. 2 XPath A locator for items in XML document. XPath expression gives direction of navigation.
Chapter 15 P, NP, and Cook’s Theorem. 2 Computability Theory n Establishes whether decision problems are (only) theoretically decidable, i.e., decides.
Recursively Enumerable and Recursive Languages
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
Space Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Complexity course Computer science department, Bar-Ilan university December 2008.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Graphs 4/13/2018 5:25 AM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 NP-Completeness.
NP-Completeness NP-Completeness Graphs 5/7/ :49 PM x x x x x x x
Recursively Enumerable and Recursive Languages
Unit 1. Sorting and Divide and Conquer
Busch Complexity Lectures: Reductions
Reductions.
Automata, Grammars and Languages
NP-Completeness NP-Completeness Graphs 11/16/2018 2:32 AM x x x x x x
NP-Completeness NP-Completeness Graphs 12/3/2018 2:46 AM x x x x x x x
Undecidable problems:
CIS Automata and Formal Languages – Pei Wang
Presentation transcript:

The Complexity of XPath Evaluation Paper By: Georg Gottlob Cristoph Koch Reinhard Pichler Presented By: Royi Ronen

Introduction All major XPath evaluating algorithms run in exponential time. Paper’s main goals: –Prove that the “XPath problem” P-complete. –Prove that other related problems are LOGCFL-complete.

XPath – Quick Reminder XPath is a query language for XML documents. Navigating through a document: /descendant::a/child::b selects nodes named “b” that have a father named “a”. Testing nodes: requires that b’s attribute c equals 3.

Sketch: How P-Completeness is proven In order to prove P-Completeness of a problem, we have to prove: –Membership in P; –P-Hardness; P P-Complete P-Hard

XPath is P-Complete Sketch: 1. Membership of XPath in P is already proven (By the same authors). 2. P-Hardness of XPath will be proven by reduction from the monotone circuit problem (which is known to be P- Complete) to Core XPath (a subset of XPath with its main features). Why is it enough?

Monotone Boolean Circuit Problem A Monotone Boolean circuit is a circuit with many inputs and one output that uses the following Boolean gates only: –AND –OR –DUMMY Given a circuit and its inputs, solving the problem is stating the output. The problem is P-Complete.

A Monotone Boolean Circuit Item 3 in the handout:

Core XPath - Definition XPath is has many features, and is inconvenient for theoretical treatment. Therefore Core XPath, a subset of XPath with its main features is defined by the following grammar (Item 1 in the handout): locpath ::= ‘/’ locpath | locpath ‘/’ locpath | locpath ‘|’ locpath | locstep. locstep ::= axis ‘::’ ntst `[' bexpr `]'... ‘[‘ bexpr ‘]’. bexpr ::= bexpr ‘and’ bexpr | bexpr ‘or’ bexpr | ‘not(’ bexpr ‘)’ | locpath. axis ::= ‘self’ | ‘child’ | ‘parent’ | ‘descendant’ | ‘descendant-or-self’ | ‘ancestor’ | ‘ancestor-or-self’ ‘following’ | ‘following-sibling’ ‘preceding’ | ‘preceding-sibling’.

The Corresponding Languages The paper shows direct reductions between the problems. We will show the same reduction, but between the corresponding languages, since it is the methodology used in the Technion Computability course. The proofs are equivalent.

The Corresponding Languages L- Core XPath: {(Q,D) | Q is a Core XPath query, D is a valid document and Q yields a non-empty result when run on D} L- Monotone Circuit: {(C,I) | C is a monotone circuit, I is a set of inputs to C and C evaluates 1 when run on I}

The Reduction Reduction is our tool to prove that one language is at least as hard as another. Here we will show: L-Circuit is reducible to L-Core XPath. It proves that L-Core XPath is at least as hard as L-Circuit, therefore P-Hard. We have to build (Q,D) that yields a nonempty result iff (C,I) evaluates to 1.

The circuit layered An equivalent monotone circuit, in which only one non- dummy gate exists in every layer (Item 4 in the handout). The gates are ordered, data can flow from lower to higher indexed gates only.

Q and D D is built as follows: M inputs, Here M=4N non-input gates, Here N=5 Total of 2(M+N)+1 nodes. Nodes are tagged, from the alphabet: {0,1,I i,O i,G } Where i is from {1,2,…,N}

Tagging Rules V1-VM are tagged each with its input value, e.g. 0 or 1. V M+N Is tagged R, V i is tagged G (inc. V M+N ). If gate G i is an input to gate G M+k (i<M+k), I k is added to V i and O k – to V M+k. V’ 1..M are tagged I i and O i, where i is in {1,..,N}. V’ M+i are tagged I k and O k, where k is in {i,..,N}. These tags will be used by the query.

A Simple Example D V0V0 V’ 1 V1V1 V’ 2 V2V2 V3V3 V’ 3 GG G R I1I1 I1I1 O1O1 O1O1 I1I1 O1O1 I1I1 I1I G1G1 C O1O1

The Query The query in the output of the reduction is: The reduction can be achieved in logarithmic space /descendant-or-self::[T(R) and ] := descendant-or-self::[T(O k ) and parent::*[ ]] := not(child::*[T(I k ) and not( )]) If G M+k is an AND Gate := child::*[T(I k ) and ( )] If G M+k is an OR/DUMMY Gate := ancestor-or-self::*[T(G) and ] := T(1) End of recursion Evaluation of G k by: selecting V 0 iff all (one of) G k inputs are (is) 1 and the gate is “AND” (“OR”). Pushing down results

Sub-queries Meaning Returns nodes in the previous iteration and their tagged children, e.g. pushes “down” results by including the children. Returns the root iff all the inputs to gate k are true, in an AND gate. Returns the root iff at least one of the inputs to gate k is true, in an OR gate. In both cases, returns the nodes that represent gates that were previously evaluated to true. Includes V k iff the root was returned by the previous sub-query. Returns the rightmost node iff the output gate is evaluated to true. (No other gate is tagged R).

The Query - Example V0V0 V’ 1 V1V1 V’ 2 V2V2 V3V3 V’ 3 GG G R I1I1 I1I1 O1O1 O1O1 I1I1 O1O1 I1I1 I1I1 10 O1O1

Discussion It is enough to show that: Reason: T(R) is true for the rightmost node only. If the last gate evaluates to 1, then the result of the query consists of that node, and (Q,D) is in Circuit. Otherwise, the result is empty, and (Q,D) is not in Circuit. V i [ ] iff G i evaluates to true

Tagged Tree Example I 23 G 1 I 24 1 G I 1 0 G O I 1 G O1 I 34 G I 5 O 2 G O 3 I 5 G O 4 I 5 G O 5 R G I 1- I 5 O 1- O 5 I 1- I 5 O 1- O 5 I 1- I 5 O 1- O 5 I 1- I 5 O 1- O 5 I 1- I 5 O 1- O 5 I 2- I 5 O 2- O 5 I 3- I 5 O 3- O 5 I 4- I 5 O 4- O 5 I5O5I5O5 and or For C in the handout

Discussion consists of the values of the k nodes in layer k of the circuit. It can also be viewed as the situation at the k- th tick of a clock in a synchronous system. Proof: V i [ ] iff G i evaluates to true

Despite P-Completeness Problems that are P-Complete are considered inherently sequential, and thus cannot benefit from parallelization. However, for real-world use, it may be very useful to find subsets of the problem and classify them into lower complexity classes (easier problems). Does anyone recall a well known problem that can benefit from such manipulation? The paper continues by looking for how to degenerate the problem.

First Modification Trial Only usage of the axes: child, parent and descendant-or-self is allowed. The modification doesn’t yield lower complexity. The same reduction will work after changing: ancestor-or-self::* to descendant-or-self::*/parent::*

Second Modification Trial Let Positive Core-XPath be: Core-XPath \ Queries that use negation. This problem is a member of LOGCFL. LOGCFL problems can be reduced in logarithmic space to a context free language. Being context free embodies the ability to be parallelized. Segments do not dependant on each other. The reduction is very similar. It uses the problem of semi-bounded circuits for the reduction.

WF and Positive WF WF is a subset of XPath that allows Core- XPath, arithmetic operations and conditions using position() last() and constants. Where is WF? Positive WF is LOGCFL-Complete. The proof of hardness resembles the proof we have just seen.

The Global Picture

BACKUP

PF is NL-Complete PF is the problem of navigating through an XML document, with no conditions allowed. NL is the class of problems solved by a Turing Machine that uses, non- deterministically, logarithmic space. Proof: PF is NL-Complete. –Membership in NL (By random guessing) –NL-Hardness