Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
CS 440 Database Management Systems Lecture 4: Constraints, Schema Design.
Using Multi-Encryption to Provide Secure and Controlled Access to XML Documents Tomasz Müldner, Jodrey School of Computer Science, Acadia University, Wolfville,
Inferring XML Schema Definitions from XML Data
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Database Management System Module 3:. Complex Constraints In this we specify complex integrity constraints included in SQL. It relates to integrity constraints.
Efficient Query Evaluation on Probabilistic Databases
Adaptive Fastest Path Computation on a Road Network : A Traffic Mining Approach Hector Gonzalez Jiawei Han Xiaolei Li Margaret Myslinska John Paul Sondag.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 COS 425: Database and Information Management Systems XML and information exchange.
Keys For XML Peter Buneman Susan Davidson Wenfei Fan Carmem Hara Wang Chiew Tan.
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
TU/e eindhoven university of technology / faculty of mathematics and informatics Exporting Databases in XML DTD A Conceptual and Generic Approach Philippe.
How can Computer Science contribute to Research Publishing?
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Databases 6: Normalization
MVDs: 1 Join Dependencies—Example Let r = A B C = A B |  | A C 1 a x 1 a 1 x 1 a y 1 b 1 y 1 b x 2 a 2 y 1 b y 2 b 2 a y 2 b y Observe: r =  AB r | 
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Information Retrieval in Practice
4/20/2017.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Hexastore: Sextuple Indexing for Semantic Web Data Management
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
A service-oriented middleware for building context-aware services Center for E-Business Technology Seoul National University Seoul, Korea Tao Gu, Hung.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Querying Structured Text in an XML Database By Xuemei Luo.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Database Systems Part VII: XML Querying Software School of Hunan University
Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,
Semantically Processing The Semantic Web Presented by: Kunal Patel Dr. Gopal Gupta UNIVERSITY OF TEXAS AT DALLAS.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Monitoring Business Processes with Queries VLDB2007 CatrielBeeri, AnatEyal, Tova Milo, AlonPilberg Summarized by Gong GI Hyun, IDS Lab., Seoul.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
A Comparison of Approaches to Large-Scale Data Analysis Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, Michael.
ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.
The relational model A data model (in general) : Integrated collection of concepts for describing data (data requirements). Relational model was introduced.
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Relations, Functions.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Relational Data Model.
Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic Presented By Dumitru Roman, Michael Kifer University of Innsbruk,
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and.
DB Tuning : Chapter 10. Optimizer Center for E-Business Technology Seoul National University Seoul, Korea 이상근 Intelligent Database Systems Lab School of.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda.
RE-Tree: An Efficient Index Structure for Regular Expressions
Computing Full Disjunctions
RichAnnotator: Annotating rich (XML-like) documents
Implementing Mapping Composition
Ying Dai Faculty of software and information science,
Chen Li Information and Computer Science
Presentation transcript:

Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference slides: Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea

Copyright  2009 by CEBT If your Manager Says… Center for E-Business Technology It’s a tragedy…

Copyright  2009 by CEBT Migration to XML Database  Benefits of using XML XML is a portable text-data format Tree-structured XML can reduce redundancy of relational data Center for E-Business Technology CompanyEmployeeOffice 1e1NY 1e2NY Co Emp Office NY e1e2 Relational Data XML Data

Copyright  2009 by CEBT Problem  Querying relational data translated into XML  Q: Retrieve a node tuple (Co, Emp, Office) from the XML data E.g. Xpath, a path expression query /Co/Emp/Office Center for E-Business Technology CompanyEmployeeOffice 1e1NY 1e2NY Co Emp Office NY e1e2 Relational Data XML Data

Copyright  2009 by CEBT Structure Variations  Tree-representation of relational data is not unique Center for E-Business Technology CompanyEmployeeOffice 1e1NY 1e2NY Co Emp Office NY e1e2 Co Office Emp e1 NY Office Emp e1e2 Co NY

Copyright  2009 by CEBT Inconvenience of Xpath Query  User must know the entire XML structures to produce correct path queries Center for E-Business Technology Co Emp Office NY e1e2 Co Office Emp e1 NY Office Emp e1e2 Co NY e2 /Co/Office/Emp//Co/Emp[Office]//Office[Co]/Emp

Copyright  2009 by CEBT Relational-Style XML Query  Query relations in XML With an SQL-like syntax  SELECT Co, Emp, Office from (XML Data) Center for E-Business Technology Co Emp Office NYNY NYNY e1e2 Co Office Emp e1 NYNY Office Emp e1e2 Co NYNY e2 CompanyEmployeeOffice 1e1NY 1e2NY Input XML Result

Copyright  2009 by CEBT To Retrieve Relations in XML Center for E-Business Technology

Copyright  2009 by CEBT Problem Definition  Convert an SQL query, SELECT A,B,C, into an XML structure query There can be many structural variations of (A,B,C) For N nodes, there exists N^(N-1) structural variations – 3^2 = 9 Center for E-Business Technology …

Copyright  2009 by CEBT An Example Center for E-Business Technology This example involves various tree structures that denote data in the same relation.

Copyright  2009 by CEBT Amoeba  A node tuple (A,B,C) is an amoeba iff one of the A,B and C is a common ancestor of the others  Amoeba join retrieves all amoeba structures in the XML data Center for E-Business Technology …

Copyright  2009 by CEBT Relation in XML  A Key observation Relation is simply embedded in XML Center for E-Business Technology Co Emp Office NY e1e2 Co Office Emp e1 NY Office Emp e1e2 Co NY e2 /Co/Office/Emp//Co/Emp[Office]//Office[Co]/Emp CompanyEmployeeOffice 1e1NY 1e2NY

Copyright  2009 by CEBT Hidden Semantics in XML  Some amoeba structure may not form a relation Why this structure is not allowed?  Because there are functional dependencies (FD) implied in the XML structure Center for E-Business Technology Company Office Emp Company Office Employee 1 M N 1

Copyright  2009 by CEBT Functional Dependencies (FD)  FD: X->Y (From a given X,Y is uniquely determined) Employee -> office (Each employee belongs to an office) Office -> company (Each office belongs to a company)  Relation in XML must have an amoeba structure corresponding to each FD  Relations and FDs are sufficient to describe a schema of XML Center for E-Business Technology Company Office Emp Company Office Employee 1 M N 1 Invalid structure!!

Copyright  2009 by CEBT Examples of FDs Center for E-Business Technology

Copyright  2009 by CEBT Detecting FDs  A type of FDs required to determine XML structures to query is one-to-many(or one-to-one) relationships: FD: Emp -> Office – Each employee belongs to an office – An office may have several employees (one-to-many)  We can observe these relationships by counting node occurrences or directory from the ER-diagram Center for E-Business Technology Company Office Emp Company Office Employee 1 M N 1

Copyright  2009 by CEBT If FDs are ignored…  The company has M offices, and each office has N employees:  # of (company, office, employee) tuples: When M = 100, N=5 100x(100x5)  While, # of correct answers is only M*N = 500 Center for E-Business Technology Company Office Employee 1 M N 1 Company Office Emp Office Emp

Copyright  2009 by CEBT Avoiding Invalid Relations  However, an amoeba structure itself is a connected component of tree nodes, and thus invalid nodes may be connected, as illustrated in Figure 9.  To avoid these irrelevant node connections, while allowing various tree structures in describing XML data, we introduce a restricted class of XML structures, called a tree relation Center for E-Business Technology

Copyright  2009 by CEBT FD-Aware Amoeba Join  Tree Relation >& >& >  FDs: Emp-> Office, Office -> Company  Bottom-up construction of query results Amoeba Join (Employee, Office) Amoeba Join (Office, Company)  FD-aware amoeba join avoids invalid XML structures Center for E-Business Technology Company Office Employee 1 M N 1 Company Office Emp Office Emp

Copyright  2009 by CEBT Query Performance  FD-aware amoeba join scales well For various sizes of XML data Center for E-Business Technology

Copyright  2009 by CEBT Experiments – Query Set Center for E-Business Technology

Copyright  2009 by CEBT Think in Relational-Style  First, consider XML := Relations + their annotations  Steps Detect relational part from XML data Detect one-to-many(one) relationships (FDs) Write relational queries – SELECT Co, Emp, Office  Things more in the paper XML Algebra Data Integration Pushing Structural Constraints for better performance Query Incomplete Relations Other experiments Center for E-Business Technology

Copyright  2009 by CEBT Contributions & Conclusions  Relation in XML Defined using amoeba structure and FDs XML Algebra  Relational-Style XML Query Retrieves relations in XML with a SQL-like query syntax (SQL over XML) Allows structural variations of XML data  Departure from path expression queries Target XML structures are automatically determined  Applications Database integration Managing relational data enhanced with XML syntax  “It’s Just SQL” A large number of XML data and queries are still relational Center for E-Business Technology

Copyright  2009 by CEBT Contributions & Conclusions Center for E-Business Technology It’s not tragedy anymore…

My thought on Center for E-Business Technology Really?

Copyright  2009 by CEBT Thoughts on The Paper  Good Points Generally, I think it’s a very good paper The paper shows us very interesting motivation and clearly define the problem The approach in the paper is intuitive and well-explained The authors performed well-designed experiments The paper proposes many future works – Maybe, one of us can work on them  However, I doubt if it is really useful – Can we use this for an important business project? Although the author insists that their algorithm scales well, some queries caused out of memory and didn’t work – It’s a crucial problem to be used in real-life business Isn’t Xpath good enough? – People who use XML docs usually know the structure of XML docs Center for E-Business Technology