1 Semi-structured data Patrick Lambrix Department of Computer and Information Science Linköpings universitet.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

XML: Extensible Markup Language
Lecture 24 MAS 714 Hartmut Klauck
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
1 Partial Order Reduction. 2 Basic idea P1P1 P2P2 P3P3 a1a1 a2a2 a3a3 a1a1 a1a1 a2a2 a2a2 a2a2 a2a2 a3a3 a3a3 a3a3 a3a3 a1a1 a1a1 3 independent processes.
Incremental Maintenance for Materialized Views over Semistructured Data Written By: Serge Abiteboul Jason McHuge Michael Rys Vasilis Vassalos Janet L.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
CS171 Introduction to Computer Science II Graphs Strike Back.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Main Index Contents 11 Main Index Contents Week 6 – Binary Trees.
Dynamic Wavelength Allocation in All-optical Ring Networks Ori Gerstel and Shay Kutten Proceedings of ICC'97.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
INTRODUCTION COMPUTATIONAL MODELS. 2 What is Computer Science Sciences deal with building and studying models of real world objects /systems. What is.
Extracting Schema From Data The difference between schemas for semistructured data and traditional schemas is that a given semistructured data can have.
Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
Managing XML and Semistructured Data Lecture : Indexes.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
NP-complete and NP-hard problems
On Database Systems.
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Chapter 9: Graphs Basic Concepts
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
ECE669 L10: Graph Applications March 2, 2004 ECE 669 Parallel Computer Architecture Lecture 10 Graph Applications.
Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
1 Introduction to databases concepts CCIS – IS department Level 4.
Digitaalsüsteemide verifitseerimise kursus1 Formal verification: BDD BDDs applied in equivalence checking.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Representing and Using Graphs
1 Chapter Elementary Notions and Notations.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Review for Final Andy Wang Data Structures, Algorithms, and Generic Programming.
Uncertainty in Trip Planning in Transportation Systems A. P. Sistla (Joint work with Booth, Wolfson and Cruz)
Lore: A Database Management System for Semistructured Data.
Semistructured Data. Semistructured data is data that has some structure, but it may be irregular and incomplete and does not necessarily conform to a.
1 Semi-structured data - exercises. 2 Represent the relations below using the OEM data model. Exercise 1.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
1Computer Sciences. 2 HEAP SORT TUTORIAL 4 Objective O(n lg n) worst case like merge sort. Sorts in place like insertion sort. A heap can be stored as.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Graphs. Graph Definitions A graph G is denoted by G = (V, E) where  V is the set of vertices or nodes of the graph  E is the set of edges or arcs connecting.
Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models.
CHAPTER 13 GRAPH ALGORITHMS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA.
Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.
Databases Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
©Silberschatz, Korth and Sudarshan 1.1 Database System Concepts قواعد البيانات Data Base قواعد البيانات CCS 402 Mr. Nedal hayajneh E- mail
abstract data types built on other ADTs
B-Trees 7/5/2018 4:26 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
Graphs & Graph Algorithms 2
Week nine-ten: Trees Trees.
Chapter 9: Graphs Basic Concepts
Exercise Find the following products mentally. 5(20) 100 5(7) 35 5(27)
Chapter 9: Graphs Basic Concepts
Presentation transcript:

1 Semi-structured data Patrick Lambrix Department of Computer and Information Science Linköpings universitet

2 Semi-structured data Data is not just text, but is not as well- structured as data in databases Occurs often in web databanks Occurs often in integration of databanks

3 Semi-structured data - properties irregular structure implicit structure partial structure a posteriori ’data guide’ versus a priori schema large data guides

4 Semi-structured data - properties It should be possible to ignore the data guide upon querying Data guide changes fast object can change type/class difference between data guide and data is blurred

5 Semi-structured data - model network of nodes object model (oid) query: path search in the network

6 OEM (Object Exchange Model) Graph Nodes: objects oid atomic or complex - atoms: integer, string, gif, html, … - value of a complex object is a set of object references (label, oid) Edges have labels OEM is used by a number of systems (ex. Lorel)

7 OEM example Restaurant Guide

8 Lorel query language 1. Find all places to eat Vietnamese food select P from RestaurantGuide.% P where P.category grep “ietnamese” 2. Find the names and streets of all restaurants in Palo Alto select R.name, A.street from RestaurantGuide.restaurant{R}.address A where A.city = “Palo Alto”

9 3. Find all restaurants to eat with zipcode select RestaurantGuide.restaurant where RestaurantGuide.restaurant(.address)?.zipcode = Wildcards and variables ? - 0 or 1 path - object variables or more paths select P from Guide.% P * - 0 or more paths select A from #.address{A} # - any path - path variables % - 0 or more chars select Lorel query language

10 Data Guides A structural summary over a data source that is used as a dynamic schema Is used in query formulation and optimization Is often created a posteriori Properties:  concise  accurate  convenient

11 Data Guides - definitions Label path: sequence of labels L1.L2. ….Ln Data path: alternating sequence of labels and oid:s L1.o1.L2.o2. ….Ln.on Data path d is an instance of label path l if the sequences of labels are identical in l and d.

12 Data Guides - definitions A data guide for object s is an object d such that every label path of s has exact one data path instance in d, and each label path in d is a label path of s.

13 Data Guides A data source can have several data guides Minimal data guides the smallest data guides

14 Data Guides - example ABB CCC DDD C D AB (a)(c) Data modelminimal Data Guide

15 Minimal Data Guides Concise May be hard to maintain Example: child node for 10 with label E

16 Strong Data Guides Intuitively: ”label paths that reach the same set of objects in the data model = label paths that reach the same objects in the data guide”

17 Strong Data Guides - definitions An object o can be reached from s via l if there is a data path of s that is an instance of l and that has o as last oid (L1.o1.L2.o2. … Ln.o) The target set for label path l in object s is the set of objects that can be reached from s via l. Notation: T(s,l) L(s,l): set of label paths of s that have the same target set in s as l.

18 Definition: d is a strong data guide for s if for all label paths l of s it holds that L(s,l) = L(d,l) There is a 1-1-mapping between target sets in the data model and nodes in a strong data guide. Strong Data Guides - definitions

19 Data Guides - example ABB CCC DDD AB CC DD C D AB (a) (b) (c) Data model strong Data Guide minimal Data Guide

20 Strong Data Guides - algorithm Implementation: - Traverse data model depth-first. - Each time you find a new target set for label path l, create a new object in the data guide. If the target set is already represented in the data guide, do not create a new object, but link to the existing object.

21 Strong Data Guides - use  Easier to maintain  Used as path index for query optimization

22 Semi-structured data - exercises

23 Represent the relations below using the OEM data model. Exercise 1

24 Using the data model from the previous question, formulate the following queries using Lorel:  find all the restaurants that are located in Linkoping  find the address (city and street) of the “Hamlet” restaurant  list the restaurants by city (equivalent of GROUP BY) Exercise 2

25 Draw the strong Data Guide for the restaurant guide data model below. Exercise 3 RestaurantGuide