1 Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
SSA.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Efficiently Querying Contradictory and Uncertain Genealogical Data Lars E. Olson and David W. Embley DEG Lab BYU Computer Science Dept. Supported by National.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Fractional Cascading CSE What is Fractional Cascading anyway? An efficient strategy for dealing with iterative searches that achieves optimal.
Heapsort By: Steven Huang. What is a Heapsort? Heapsort is a comparison-based sorting algorithm to create a sorted array (or list) Part of the selection.
Querying Disjunctive Databases in Polynomial Time Lars Olson Masters Thesis Brigham Young University Supported by NSF Grant #
Xyleme A Dynamic Warehouse for XML Data of the Web.
Efficient XML Storage, Query, and Update Shi Xu Heng Yuan Spring 2004 CS240B Prof. Zaniolo.
Physical Database Monitoring and Tuning the Operational System.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Storage of XML Data XML data can be stored in –Non-relational data stores Flat files –Natural for storing XML –But has all problems discussed in Chapter.
Database Systems and XML David Wu CS 632 April 23, 2001.
Alternating Turing Machine (ATM) –  node is marked accept iff any of its children is marked accept. –  node is marked accept iff all of its children.
CS 261 – Winter 2010 Trees. Ubiquitous – they are everywhere in CS Probably ranks third among the most used data structure: 1.Vectors and Arrays 2.Lists.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 10 Instructor: Paul Beame.
Results of Using an Efficient Algorithm to Query Disjunctive Genealogical Data Lars E. Olson and David W. Embley DEG Lab BYU Computer Science Dept. Supported.
General Trees and Variants CPSC 335. General Trees and transformation to binary trees B-tree variants: B*, B+, prefix B+ 2-4, Horizontal-vertical, Red-black.
Complexity ©D.Moshkovitz 1 Paths On the Reasonability of Finding Paths in Graphs.
CS405G: Introduction to Database Systems Final Review.
Permitting Inconsistent Data in Data Storage Lars Olson DEG Lab BYU Computer Science Dept. Supported by National Science Foundation Grant #
C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Trees.
Minimum Spanning Tree in Graph - Week Problem: Laying Telephone Wire Central office.
4/20/2017.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
B + TREE. INTRODUCTION A B+ tree is a balanced tree in which every path from the root of the tree to a leaf is of the same length, and each non leaf node.
Lecture 12-2: Introduction to Computer Algorithms beyond Search & Sort.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
Announcements Exam Friday. More Physical Storage Lecture 10.
1 Part 2: EJB Persistency Jianguo Lu. 2 Object Persistency A persistent object is one that can automatically store and retrieve itself in permanent storage.
 DATA STRUCTURE DATA STRUCTURE  DATA STRUCTURE OPERATIONS DATA STRUCTURE OPERATIONS  BIG-O NOTATION BIG-O NOTATION  TYPES OF DATA STRUCTURE TYPES.
Customer Order Order Number Date Cust ID Last Name First Name State Amount Tax Rate Product 1 ID Product 1 Description Product 1 Quantity Product 2 ID.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
XML Refresher Course Bálint Joó School of Physics University of Edinburgh May 02, 2003.
Introduction to Parsing
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Mapping RDB Schema to.
Normalization Transparencies 1. ©Pearson Education 2009 Objectives How the technique of normalization is used in database design. How tables that contain.
Chapter 2 Introduction to Relational Model. Example of a Relation attributes (or columns) tuples (or rows) Introduction to Relational Model 2.
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
DAT702 Normal Forms Normalization Rules. Normal Forms Normal Forms, also called Normalization rules, are basically processes or steps taken to allow for.
Discrete Mathematics Chapter 5 Trees.
Internal and External Sorting External Searching
M Clements Formal Network Theory. Introduction Practical problem – The Seven Bridges of Königsberg Network graphs Nodes & edges Degrees Rules/ axioms.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Introduction to Databases Angela Clark University of South Alabama.
Lecture 8CSE Intro to Cognitive Science1 Interpreting Line Drawings II.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
NORMALIZATION Handout - 4 DBMS. What is Normalization? The process of grouping data elements into tables in a way that simplifies retrieval, reduces data.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
Data Structures & Algorithm Analysis lec(8):Graph T. Souad alonazi
Review: Discrete Mathematics and Its Applications
CS 405G: Introduction to Database Systems
Intro to XML.
CS405G: Introduction to Database Systems
Trees & Forests D. J. Foreman.
Review: Discrete Mathematics and Its Applications
2/18/2019.
CS 261 – Data Structures Trees.
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
GRAPHS.
Presentation transcript:

1 Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF

2 Introduction Disjunctive databases –Needed to represent disjunctive data –Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] Transitive closure in disjunctive graphs –CoNP-complete in general –Polynomial time, under certain circumstances [Lobo et. al, 1995]

3 The Problem How do we convert the data into a disjunctive graph? What is the complexity of the conversion? –Time –Space / Memory

4 Implementation XML data repository –Shore / Niagara (Univ. of Wisconsin) –Xerces XML parser (Apache.org) How do we represent a disjunctive database in storage? –Needs to be easy to convert to disjunctive graph –Needs to minimize the changes to the DTD and thus, the existing data

5 XML → Graph Conversion XML → DOM tree... A B doc Node B EdgeTo Use primary key to distinguish doc→Node edges Use foreign key to perform join (EdgeTo.ref = Node.name) :A :B

6 Disjunctions in XML, 1 st Case... A B C D …but how do we represent a disjunctive tail?

7 Disjunctions in XML, 1 st Case... or… E F G H E F G doc H

8 Disjunctions in XML, 2 nd Case... E F G What if the disjunction isn’t the full cross-product? H

9 Disjunctions in XML, 3 rd Case... I J K L

10 Time and Space Complexity n = # of nodes in DOM tree –counts edges as well –not necessarily proportional to # of values in the database Ordinary XML: traverse tree, add edges. Distinguish records with primary keys, add edges for foreign keys. O(n) time, O(n) space.

11 Time and Space Complexity : same, except only one edge to all children. O(n), O(n). with and : traverse tree, add and elements to a list, add one edge, repeat for each Tail/Head pair. O(n), O(n).

12 Summary We need to introduce new XML constructs: – – Helper constructs and Three cases – simple tail, compound head – full cross-product – partial cross-product Time and space requirements consistent with the transitive closure algorithm

13 Future Work Solving path queries Adding XML constructs for more complicated disjunctions e.g. Tail (A or B), Head ((C and D) or E) Determining frequency of disjunctive data in real-world data Developing a normal form for disjunctive XML –Minimize redundancy –Minimize disjunctive tails