Logic as a Query Language: from Frege to XML

Slides:



Advertisements
Similar presentations
1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
SQL Group Members: Shijun Shen Xia Tang Sixin Qiang.
CS 540 Database Management Systems
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Database Systems Chapter 1 The Worlds of Database Systems.
Query Processing Presented by Aung S. Win.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Databases : SQL-Introduction 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman distributes.
XML Typing and Query Evaluation. Plan We will put some formal model underlying XML Trees and queries on them – Keeping in mind the practical aspects but.
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Databases 1 Second lecture.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
1 Introduction to SQL Database Systems. 2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Logics, automata and algorithms for graphs p. madhusudan (madhu) University of Illinois at Urbana-Champaign, USA.
EJBs +XML + Integrity Constraints Data-Object Modeling and Optimization (DOMO) June 2003 Rajesh Bordawekar, Michael Burke, Mukund Raghavachari, Vivek Sarkar,
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Tree Automata First: A reminder on Automata on words Typing semistructured data.
Databases : SQL Multi-Relations 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Relational Algebra & Calculus
Select-From-Where Statements Multirelation Queries Subqueries
XML: Extensible Markup Language
Chapter 8: Concurrency Control on Relational Databases
Schedule Today: Jan. 28 (Mon) Jan. 30 (Wed) Next Week Assignments !!
CPSC-310 Database Systems
Formal Methods in software development
Example: R3 := R1 × R2 R3( A, R1.B, R2.B, C )‏ R1( A, B )‏ R2( B, C )‏ R3( A, R1.B, R2.B, C )‏
Relational Algebra Chapter 4 1.
Introduction to Database Systems, CS420
CPSC-608 Database Systems
CPSC-608 Database Systems
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
CPSC-310 Database Systems
Relational Algebra.
Relational Algebra 1.
Relational Algebra Chapter 4 1.
Managing XML and Semistructured Data
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Basic Operations Algebra of Bags
Lecture 10: Query Complexity
Query Optimization.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CENG 351 File Structures and Data Managemnet
CPSC-608 Database Systems
CMSC-461 Database Management Systems
Relational Algebra & Calculus
Relational Algebra Chapter 4 - part I.
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Presentation transcript:

Logic as a Query Language: from Frege to XML Victor Vianu U.C. San Diego

Logic and Databases: a success story • FO lies at the core of modern database systems • Relational query languages are based on FO: SQL, QBE • More powerful query languages (all the way to XML) are based on extensions of FO • Foundations lie in classical logic FO : Frege relational algebra : Tarski

Why is FO so successful as a query language? • easy to use syntactic variants SQL, QBE • efficient implementation via relational algebra amenable to analysis and simplification • potential for perfect scaling to large databases very fast response can be achieved using parallel processing

A relational database: drinker bar bar beer frequents serves Joe King’s King’s Bass Joe Molly’s King’s Bud Sue Molly’s Molly’s Bass ………... …………… • logically a finite first-order structure

Find the drinkers who frequent some bar serving Bass FO: d:drinker |  b:bar (frequents(d,b)  serves(b, Bass)) QBE: bar beer drinker bar frequents serves d b b Bass drinker answer d

¬ ¬ not Find the drinkers who frequent some bar serving Bass FO: d:drinker |  b:bar (frequents(d,b)  serves(b, Bass)) ¬ QBE: bar beer drinker bar frequents serves d b b Bass ¬ drinker answer d

• Naïve implementation: nested loops d:drinker |  b:bar (frequents(d,b)  serves(b, Bass)) for each drinker for each bar check the pattern Number of checks: |drinkers|  |bars| Roughly n : unacceptable for large databases! 2 • Better approach: relational algebra

Relational algebra operations • union, difference bar beer King’s Bass Molly’s Bass • selection   (serves) = …………… …………… beer = Bass bar King’s • projection   (serves) = bar Molly’s ……

• join || frequents || serves drinker bar bar beer frequents serves Joe King’s King’s Bass Joe Molly’s King’s Bud Sue Molly’s Molly’s Bass ………... …………… drinker bar beer frequents || serves King’s Bass Joe King’s Bass Molly’s Bass Joe King’s Bud …………… Joe Molly’s Bass Sue Molly’s Bass ……………

Relational algebra queries Find the drinkers who frequent some bar serving Bass  (  ( frequents || serves )) drinker beer = Bass drinker bar beer drinker bar beer drinker Joe King’s Bass King’s Bass King’s Bass Joe Sue ….. Joe King’s Bass Molly’s Bass Molly’s Bass Joe King’s Bud …………… …………… Joe Molly’s Bass Joe Molly’s Bass Sue Molly’s Bass Sue Molly’s Bass …………… …………… Theorem: Relational algebra and FO are equivalent

Journey of a Query FO (SQL) z(P(xz)  Q(zy))  … Relational Algebra 13(PQ)  … Query Rewriting 14(PS)  Q  R Query Execution Plan Execution Physical Level   14 Q R  P S

• rewriting rules for algebra queries  ( ( frequents || serves )) drinker beer = Bass • efficient algorithms for individual operations Indexes: special “directories” to data cost: roughly n ( log n) much better than n for large databases! 2

• rewriting rules for algebra queries  ( ( frequents || serves )) drinker beer = Bass  [frequents ||  ( (serves ))] drinker bar beer = Bass • efficient algorithms for individual operations Indexes: special “directories” to data cost: roughly n ( log n) much better than n for large databases! 2

Most spectacular: theoretical potential for perfect scaling! • perfect scaling: given sufficient resources, performance does not degrade as the database becomes larger • key: parallel processing • cost: number of processors polynomial in the size of the database • role of algebra: operations highlight parallelism

Each algebra operation can in principle be implemented very efficiently in parallel Example: projection  (serves) bar bar beer bar serves King’s Bass King’s King’s Bud Molly’s Molly’s Bass …… …………… Constant parallel time!

Another example: join ………... frequents || serves drinker bar Joe King’s Joe Molly’s drinker bar beer Sue Molly’s ………... King’s Bass Joe King’s Bass Joe King’s Bass Molly’s Bass Joe King’s Bud …………… Joe Molly’s Bass bar beer Sue Molly’s Bass serves King’s Bass King’s Bud …………… Molly’s Bass ……………

Every relational algebra query takes constant parallel time!  ( ( frequents || serves )) drinker beer = Bass  drinker  beer = Bass constant parallel time || frequents serves

Summary so far: • Keys to the success of FO as a query language: -ease of use -efficient implementation via relational algebra • Constant parallel complexity: the full potential of FO as a query language remains yet to be realized!

Beyond relational databases: the Web and XML • relations replaced by trees (XML data) • structure described by schemas (e.g., DTDs) Again, logic provides the foundations: • DTDs are equivalent to tree automata (MSO on trees) • XML queries are essentially tree transducers • Can use automata and logic to understand semantics and expressiveness, perform static analysis

Most XML query languages are extensions of SQL • implementation based on same paradigm • uses extensions of relational algebra • query optimization builds upon relational techniques

XML and DTDs <dealer> <UsedCars> <ad> <model>Honda</model> <year>96</year> </ad> </UsedCars> <NewCars> <model>Acura</model> </NewCars> </dealer> dealer UsedCars NewCars ad ad model year model Honda 96 Acura

Data Type Definition (DTD) : alphabet of element names, root   set of rules: e r regular expression over  element name

Documents satisfying a DTD root …. e e r e1 …. ek  r Set of trees satisfying DTD d: T(d)

Example A DTD and a tree satisfying it: root section*; section intro, section*,conclusions; root section section intro section conc intro conc intro section section conc intro conc intro conc

Specialization dealer UsedCars NewCars ad ad model year model ad has different structure in different contexts

Specialization dealer UsedCars NewCars adused adnew model year model ad has different structure in different contexts

• What sets of trees can be defined? Exactly the regular tree languages! --trees accepted by tree automata --trees defined by Monadic Second-Order Logic (MSO) • XML query languages are essentially tree transducers • Consequences: can use automata/logic techniques to analyze and manipulate DTDs and XML queries

Example: static analysis for robust data integration Integrated View Common DTD XML Source XML Source Source DTDs

Example: static analysis for robust data integration Integrated View Tree automaton B Common DTD Tree transducer T XML Source XML Source Tree automaton A Source DTDs

Example: static analysis for robust data integration Integrated View Tree automaton B Common DTD Tree transducer T XML Source XML Source Tree automaton A Source DTDs Need to check: T(A)  B Key: T-1(B) definable in MSO

Conclusion Logic has provided the foundations of databases, from relational databases all the way to XML FO lies at the core of relational database systems • XML and its query languages are founded upon tree automata, tree transducers, and logics on trees • Implementation uses extensions of relational algebra and builds upon relational database techniques