Managing XML and Semistructured Data

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Covering Indexes for XML Queries by Prakash Ramanan
1 Information Preserving XML Schema Embedding Philip BohannonBell Laboratories Wenfei FanUniv of Edinburgh & Bell Labs Michael Flaster Bell Laboratories.
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Managing XML and Semistructured Data Lecture : Indexes.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.
1 Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
Managing XML and Semistructured Data Lecture 14: Constraints and Keys Prof. Dan Suciu Spring 2001.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Determine whether each curve below is the graph of a function of x. Select all answers that are graphs of functions of x:
4/20/2017.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
10/06/041 XSLT: crash course or Programming Language Design Principle XSLT-intro.ppt 10, Jun, 2004.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
Copyright © 2004 Pearson Education, Inc.. Chapter 26 XML and Internet Databases.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Dr. Bhavani Thuraisingham September 2006 Building Trustworthy Semantic Webs Lecture #5 ] XML and XML Security.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 5 courtesy of Ghislain Fourny/ETH © Department of Computer.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Advance Database S Week-6 Dr.Kwanchai Eurviriyanukul
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Tree Automata First: A reminder on Automata on words Typing semistructured data.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
XML: Extensible Markup Language
Logic as a Query Language: from Frege to XML
Management of XML and Semistructured Data
A Normal Form for XML Documents
Managing XML and Semistructured Data
OrientX: an Integrated, Schema-Based Native XML Database System
XML-Based RDF Data Management for Efficient Query Processing
A Normal Form for XML Documents
Managing XML and Semistructured Data
Lecture 09: Functional Dependencies, Database Design
Querying XML XPath.
Querying XML XPath.
2/18/2019.
XML indexing – A(k) indices
Wednesday, May 29, 2002 XML Storage Final Review
Presentation transcript:

Managing XML and Semistructured Data Lecture 15: Query Analysis Prof. Dan Suciu Spring 2001

In this lecture Query rewriting Query rewriting with schema Resources examples Query rewriting with schema Resources Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, Data Engineering, 98 Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition, K. Bohm, K. Gayer, K. Aberer, T. Özsu

Query Analysis Generic term to describe: Query rewriting based on schema information Query containment and minimization

Query Rewriting Problem: Given a query Q Given a schema S Regular path expression Or more complex Xquery expression Given a schema S graph schema DTD XML-Schema Rewrite Q to some QS s.t. Q is equivalent to QS over databases conforming to S QS is more efficient than Q

Query Rewriting Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, Data Engineering, 98 Simplest setting: Regular path expression Graph schemas

Example of Query Rewriting Naive evaluation: need to traverse entire graph (or tree) Q = //Department//Project

Example of Query Rewriting Graph Schema: s1 S = other Org s2 other “Project” “Member” s3 other Org = “Department”  “College”  “School” other = Org  ”Project”  ”Member” s4 other

Example of Query Rewriting Schema says: “there can be at most one Department edge; below, there can be at most one Project edge” QS can be evaluated more efficiently than Q Why ? Q = //Department//Project QS = (other)*/Department/(other)*/Project other =  “Department”  “College”  “School”  ”Project”  ”Member”

Example of Query Rewriting How to construct QS systematically from Q and S ? Step 1 build the automaton A for Q Step 2 build the product automaton S x A Step 3 QS = expression of S x A

Example of Query Rewriting true true A = Dept Project a3 a1 a2 S x A = other false other false S = s1 other false Dept Org Org Org other false other false s2 other false false Project Project Project other false other false Member Member s3 other other other false false s4 other QS = (other)*/Department/(other)*/Project

Query Rewriting Correctness: Proposition If the instance I conforms to S, then Q(I) = QS(I) That is, Q and QS are equivalent over databases conforming to S

Query Rewriting Efficiency Given query Q, instance I, define: cost(Q,I) = | {w(I) | wprefix(Lang(Q))} | Proposition If Q and Q’ are equivalent over all databases conforming to S, and if I conforms to S, then cost(QS,I)  cost(Q’,I) Hence, QS is optimal (in a certain sense)

Query Rewriting Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition, K. Bohm, K. Gayer, K. Aberer, T. Özsu More complex settings: Schema = DTD Query = region algebrar (think: Xpath) Problem is more complex; this works proposes some solution

Query Rewriting Idea: analyze DTD and extract 3 relations: Exclusivity. Element is E1 exclusively contained in E2 if every path from the root to E1 goes through E2 Xpath simplification: E1[ancestor-or-self::E2]  E1

Query Rewriting Obligation E1 obligatorily contains E2 if it has a child of type E2 E1[E2]  E1

Query Rewriting Entrance Location E is an entrance location for E1, E2 if every path from E1 to E2 goes through some E E1[ancestor-or-self::E2]  E1[ancestor-or-self::E[ancestor-or-self::E2]]

Query Rewriting Add these rules, plus variations, to a rule-based optimizer HyperStorM – a Structured Document Database On top of VODAK – an oo database system Open question: does this approach exploit all the information in a DTD/XML-Schema ? How can we exploit what is not used ?