Lecture 10: Query Complexity

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

Lecture 3 Universal TM. Code of a DTM Consider a one-tape DTM M = (Q, Σ, Γ, δ, s). It can be encoded as follows: First, encode each state, each direction,
Lecture 24 MAS 714 Hartmut Klauck
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
1 L is in NP means: There is a language L’ in P and a polynomial p so that L 1 · L 2 means: For some polynomial time computable map r : 8 x: x 2 L 1 iff.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture13: Mapping Reductions Prof. Amos Israeli.
Computability and Complexity 19-1 Computability and Complexity Andrei Bulatov Non-Deterministic Space.
Lecture 8 Recursively enumerable (r.e.) languages
Computability and Complexity 32-1 Computability and Complexity Andrei Bulatov Boolean Circuits.
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Class NL.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Reducibility. 2 Problem is reduced to problem If we can solve problem then we can solve problem.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
1 CSE 326: Data Structures: Graphs Lecture 24: Friday, March 7 th, 2003.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Fall 2013 CMU CS Computational Complexity Lecture 2 Diagonalization, 9/12/2013.
1 Finite Model Theory Lecture 12 Regular Expressions, FO k.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
Theory of Computational Complexity TA : Junichi Teruyama Iwama lab. D3
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Decidability.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
Chapters 11 and 12 Decision Problems and Undecidability.
Quick Course Overview Quick review of logic and computational problems
Discrete Mathematics for Computer Science
Finite Model Theory Lecture 8
Busch Complexity Lectures: Reductions
Linear Bounded Automata LBAs
CSE 105 theory of computation
CS154, Lecture 11: Self Reference, Foundation of Mathematics
CSE 105 theory of computation
Turing Machines Acceptors; Enumerators
Busch Complexity Lectures: Undecidable Problems (unsolvable problems)
Undecidable Problems (unsolvable problems)
CS154, Lecture 10: Rice’s Theorem, Oracle Machines
How Hard Can It Be?.
Steven Lindell Scott Weinstein
Alternating tree Automata and Parity games
Decidable Languages Costas Busch - LSU.
Decidability and Undecidability
Computability and Complexity
Finite Model Theory Lecture 6
CS21 Decidability and Tractability
CSC 4170 Theory of Computation The class NP Section 7.3.
Proposed in Turing’s 1936 paper
CS154, Lecture 13: P vs NP.
Great Theoretical Ideas in Computer Science
Umans Complexity Theory Lectures
CSE 544: Lecture 8 Theory.
CSE 105 theory of computation
CS154, Lecture 11: Self Reference, Foundation of Mathematics
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Lecture 4: Unsolvable Problems
CSE544 Wednesday, March 29, 2006.
CSE 105 theory of computation
Presentation transcript:

Lecture 10: Query Complexity Thursday, February 1, 2001

Safe-FO = Relational Algebra Recall the 5 operators in the relational algebra: U, -, x, s, P Theorem. A query is expressible in safe-FO iff it is expressible in the relational algebra

Proof RA query E  safe FO query f

Proof Define: Active domain formula: safe FO query f  RA query E

No need for  (why ?)

Examples Vocabulary: D(x), L(x,y), B(y) Find drinkers who like Bud:

Examples Find drinkers who like only Bud SQL: select D.x from D where “Bud” = ALL (select L.y from L where D.x=L.x) First Order Logic to Relational Algebra: Why ? Because:

Discussion (safe)-FO and RA: Query languages (safe)-FO: for declarative query. RA: for query plan. Theorem says: translate (safe)-FO to RA In practice: need to consider “best” RA Query languages (safe)-FO is just one instance; will discuss smaller and larger languages All will express only computable, generic, and domain independent queries

Classical Logic v.s. Logic on Finite Models Recall: given a model D=(D,R1,...,Rk) and given a closed FO formula f we have defined what D |= f means A formula is valid if, for every D, D |= f It is finitely valid if for every finite D, D |= f A formula is satisfiable if there exists D s.t. D |= f It is finitely satisfiable if there exists a finite D s.t. D |= f Obviously: f is valid iff not(f) is not satisfiable

Classical Logic Notation: |= f means f is valid Notation: |-- f means f is “provable” Godel’s Completeness Theorem: |= f iff |-- f Corollary. The set of valid formulas is r.e. Idea: enumerate all proofs Church’s Theorem: if ar(Ri) > 1 for some i, then the set of valid formulas is not decidable. Corollary. The set of satisfiable formulas is not r.e.

Logic on Finite Models Simple Fact: the set of finitely satisfiable formulas is r.e. Idea: enumerate all finite models D, and all formulas f s.t. D |= f Trakhtenbrot’s Theorem: if ar(Ri) > 1 for some i, then the set of finitely satisfiable formulas is not decidable Corollary: the set of finitely valid formulas is not r.e.

An Example Where Finite/Infinite Differ A formula f that is satisfiable but not finitely satisfiable “< is a total order and has no maximal element” It has an infinite model, but no finite one

Applications of Trakhtenbrot’s Theorem Given a FO query f , it is undecidable if f is safe Proof: the query is unsafe iff f is finitely satisfiable Given two FO queries f , f’, it is undecidable if they are equivalent, i.e. f  f’ Proof the queries and are equivalent iff f is not finitely satisfiable Trakhtenbrot’s theorem for FO queries = like Rice’s theorem for programs

More of This Stuff Definition. A query q is monotone if, for any two finite models D = (D, R1, ..., Rk) and D’ = (D’, R1’, ..., Rk’) s.t. D  D’, R1  R1’, ..., Rk  Rk’ we have q(D)  q(D’). Proposition. It is undecidable if a query q in FO is monotone. Proof: why ?

Complexity of Query Languages All queries in a query language L are computable But usually L does not express all computable queries Limited expressive power. Why do we care about such languages ? Typically queries always terminate (e.g. FO) Typically queries have a low complexity (next)

Complexity of Query Languages For a query language L, define: Data complexity: fix a query q, how complex is it to evaluate q(D), for finite models D. Expression complexity: fix a finite model D, how complex is it to evaluate q(D), for queries q in L Combined complexity: how complex is it to evaluate q(D), for finite models D and queries q in L

Complexity of Query Languages Formally: Data complexity of L is the complexity of deciding the set: for some q in L Combined complexity of L is the complexity of deciding the set:

Who Cares About What Users: care about data complexity: the query q is fixed; the database D is variable Database Systems: care about combined complexity: both the query q and the database D are variable Database Theoreticians: care about expression complexity, when they need to publish more papers 

Crash Course in Complexity Classes Fix a problem, i.e. a set S. Given a value x, how difficult is it for a Turing Machine to decide whether x  S Initially holds an encoding of x a b c b c d Finite control

Four Important Complexity Classes Let n = |x| Definition. S is in PTIME if there exists a Turing machine that on every input x takes nO(1) steps (i.e. O(nk), for some k > 0). Example: S = {G | G is connected} n = |G|, then one can check if G is connected in O(n3) steps (Warshall’s algorithm)

Four Important Complexity Classes Definition. S is in PSPACE if there exists a Turing machine for S that on every input x takes nO(1) space. Example. S = {G | G has a Hamiltonean path} space: O(n) Can run for a very long time: cO(n)

Four Important Complexity Classes Definition. S is LOGSPACE if there exists a Turing machine for S that on every input takes O(log n) space. OOPS ! We need O(n) space to encode the input. How can we use less space ? Use two separate tapes: Read only for the input: length = n Read/write for work area: length = O(log n) Use work tape as index into the input tape

Input tape (read only) a b c b c d 0 1 0 b c d Finite control m n p May have output tape (write only)

Four Important Complexity Classes Definition. S is NLOGSPACE if there exists a nondeterministic Turing machine for S that on every input takes O(log n) space.

Example S = {(G, x, y) | there exists a path from x to y in G} u = x; for i = 1,n do if u = y then accept; u = (choose one of u’s successors); endfor; reject; Need space for i: only takes O(log n) In English: transitive closure is in NLOGSPACE

Remarks How long can it run ? At most 2O(log n)=nO(1). Hence: LOGSPACENLOGSPACE PTIME Suppose T1, T2 are Turing machines using O(log n) space. Can we construct a Turing machine computing T2 T1 ? YES o

FO Data Complexity Theorem. The data complexity for safe-FO is LOGSPACE. Proof. Compute bottom up. Example: T1 computes needs 2log n space T2 computes needs 2log n space T3 computes needs 2log n space T4 computes needs 2log n space …. Compose all these machines: one machine, O(log n)

Management of Variables in FO How much time did we need ? Answer: nO(number of variables) FOk = FO restricted to the variables x1, …, xk Find nodes (x,y) connected by a path of length 4: FO5, running time O(n5) FO3, running time O(n3)

FO Combined Complexity Theorem. The combined (data+query) complexity in FO is in PSPACE. Theorem. The combined (data+expression) complexity of FOk for fixed k is PTIME Proof: assignment.