OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla

Slides:



Advertisements
Similar presentations
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
Advertisements

XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Introduction to Structured Query Language (SQL)
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 COS 425: Database and Information Management Systems XML and information exchange.
Query Optimization for Semistructured Data Jason McHug, Jennifer Widom Stanford University - Rajendra S. Thapa.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Database Systems and XML David Wu CS 632 April 23, 2001.
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.
Lore: A Database Management System for Semistructured Data.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Introduction to Structured Query Language (SQL)
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
UMR Web Data Management Sanjay Kumar Madria Department of Computer Science University of Missouri-Rolla
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.
4/20/2017.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Chapter 3 Single-Table Queries
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
1 Semi-structured data Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv Jeffrey.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Lore: A Database Management System for Semistructured Data.
Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Semistructured Data. Semistructured data is data that has some structure, but it may be irregular and incomplete and does not necessarily conform to a.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Tries Data Structure. Tries  Trie is a special structure to represent sets of character strings.  Can also be used to represent data types that are.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Page 1 Renzo Angles and Claudio Gutierrez University of Chile ACM Computing Surveys, 2008 Survey of Graph Database Models.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
XML: Extensible Markup Language
Computing Full Disjunctions
SQL: Structured Query Language DML- Queries Lecturer: Dr Pavle Mogin
Data Warehousing/Mining Comp 150 DW Semistructured Data
ISC321 Database Systems I Chapter 10: Object and Object-Relational Databases: Concepts, Models, Languages, and Standards Spring 2015 Dr. Abdullah Almutairi.
Semi-Structured data (XML Data MODEL)
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Semi-Structured data (XML)
Presentation transcript:

OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla

Source :

Semistructured Data (SSD) No explicit schema; Irregular and incomplete data Schema may be hidden or mixed with data Examples: –Semi-structured data arises mainly from the integration of heterogeneous data sources; both structured and non- rigid structured Information sources change, or new sources added. semantic discrepancies among heterogeneous data sources –Data from the web Overall site structure may change often. – Biological data

Characteristics of SSD Missing or additional attributes Multiple attributes Different types in different objects Heterogeneous collections Self-describing Irregular, no priori structure

Object Exchange Model (OEM) Motivation Self-describing data model information exchange and extraction Handle incomplete and irregular data Why a new data model? … it not a new model.

LORE Lore : Lightweight Object Repository –Lightweight because Object Model supported is lightweight No multiuser or heavyweight DBMS features

Lore - motivation Relational data model has null values, and OO models have inheritance and complex objects. Both have difficulties in designing schemas to incorporate irregular data. To manage semi-structured data, as in such environment : –Difficult to decide in advance on a single, correct schema as structure of the data may evolve rapidly, or data elements may change types, or data not conforming to previous structure may be added

Thus: –Need for management of semi-structured data! Data managed by Lore is not confined to a schema and it may be irregular or incomplete. OEM is the Lore’s data model. Lorel is Lore’s query language.

Object Exchange Model (OEM) Data in this model can be thought of as a labeled directed graph. –Schema-less and self-describing. –nodes are objects and –edges are labeled with attribute names and, –leaf nodes have atomic values Object nesting. Vertices in graph are objects. –Each object has a unique object identifier (oid), such as &5. –Atomic objects have no outgoing edges and are of types such as int, real, string, gif, java, etc. –All other objects that have outgoing edges are called complex objects.

OEM (Cont.) Examples: –Object &3 is complex, and its subobjects are &8, &9, &10, and &11. –Object &7 is atomic and has value “Clark”. DBGroup is a name that denotes object &1. (Names are entry points into the database). Type and structure heterogeneity : Observe that members may have 0, 1 or more offices, office is a string and sometimes, also complex object, a room may be string and integer DbGroup.Member denotes all member-labeled subobjects

DBGroup &1 &2&3&4&5&6 &11&8 &9 &10&12&13&14&7&15&16 &17&18&19&20 Member Office Age NameProject Name Age Office RoomBuilding Room “Clark”“Smith ” 46“Gates 252” “Lore”“Tsimmis ” “Jones”28 An OEM Database “CIS”“411”“CIS”252

Object Exchange Model - OEM Each value exchanged is given an explicit label. Object  temp-in-Fahrenheit, integer, 80  -“temp-in-Fahrenheit” is the label. -Each object is self-describing, with a label, type and value.  set-of-temps, set, {cmpnt1, cmpnt2}  cmpnt1 is  temp-in-Fahrenheit, integer, 80  cmpnt2 is  temp-in-Celsius, integer, 20 

Labels Lables Play two roles – identifying an object (component) –identifying the meaning of an object (component) Person-name both identifies cmpnt1 and coveys its meaning.  person-record, set, {cmpnt1, cmpnt2, cmpnt3}  cmpnt1 is  person-name, string, ``Fred’’  cmpnt2 is  office-num-in-bldg-5, integer, 333  cmpnt3 is  department, string, ``toy’’  In relational data this corresponds to ….

Labels - Issues Labels are relative (more specific) to the source of the data object. Similar labels from different sources need to be resolved. Labels provide the flexibility in representing object structure

OEM - Specification Each object in OEM has the following structure: –Label: A variable character string describing what the object represents. –Type: The data type of the object’s value. Each is either an atom type, or type set. –Value: A variable-length value of the object. –Object-ID: A unique variable-length identifier for the object or null. LabelTypeValueObject-ID

OEM - Summary OEM is an information exchange model. It does not specify how objects are stored at source. OEM does specify how objects are received at a client, but after objects are received they can be stored in any way the client likes. Each source has a distinguished object with lexical identifier ``root’’.

Example doc1 is auths1 is auth11 is topic1 is call-no1 is doc2 is auths2 is auth21 is auth22 is auth23 is topic2 is call-no1 is docn is authsn is <auth,string, ``Crichton’’> topic1 is call-no1 is biblio is the root object.

OEM - QL SELECT Fetch-expression FROM Object WHERE Condition The result of this query is itself an object, with special label ``answer’’:  answer, set, {obj1, obj2, …, objn}  Each returned obji is a component of object specified in the From clause of the query, where the component is located by the Fetch- expression and satisfies the Condition.

Path The notion of path is used in both Fetch- Expression in the Select clause and the condition in the Where clause. Path describes traversals through an object using subobject structure and labels. Example: ``biblio.doc.auth’’ Paths are used in Fetch-Expression to specify which components are are returned in the answer object. Paths are used in the condition to qualify the fetched objects or other (related) components in the same object structure.

Queries - Simple Retrieve the topic of each document for which ``Ullman’’ is one of the authors: SELECT biblio.doc.topic FROM root WHERE biblio.doc.auth-set.auth-ln = ``Ullman’’ Intuitively, the query’s where clause finds all paths through subobject structure with the sequence of labels [biblio,doc,auth-set,auth-ln] such that the object at the end of the path has value ``Ullman.’’ obj1 is obj2 is

Queries - ``wild-cards’’ Retrieve all documents with internal call number: SELECT biblio.?.topic FROM root WHERE biblio.?.internal-call-no ``?’’ label matches any label. For this query, the doc labels can be replaced by any other strings and query would produce the same result. By convention, two occurrences of ? In the same query must match the same label unless variables are used. obj1 is

Queries - ``wild-paths’’ Retrieve all documents with internal call number: SELECT *.topic FROM root WHERE *.internal-call-no Symbol ``*’’ matches any path of length one or more. The use of * followed by a single label is a convenient and common way to locate objects with a certain label in complex structure. Similar to ?, two occurrences of * in the same query must match the same sequence of labels, unless variables are used. obj1 is

Queries - variables Retrieve each document for which both ``Hopcroft’’ and ``Aho’’ are co-authors: SELECT biblio.doc FROM root WHERE biblio.doc.auth-set.auth-ln=``Aho’’ and biblio.doc.auth-set.auth-ln=``Hopcroft’’ Here, the query finds all the paths with structure [biblio, doc, auth-set], and with two distinct path completions with label auth with values ``Aho’’ and ``Hopcroft’’ obj1 is the complete doc2

Lorel Query Language Need query language that supports path expressions for traversing graph data and handling of ‘typeless’ data. A simple path expression is a name followed by a sequence of labels. –DBGroup.Member.Office. –Set of objects that can be reached starting with the DBGroup object, following edges labeled as member and then office.

Lorel (cont.) Example: –select DBGroup.Member.Office where DBGroup.Member.Age < 30 Result: –Office “Gates 252” –Office Building “CIS” Room “411”

Lorel Query Rewrite Previous query rewritten to (OQL style) –select O from DBGroup.Member M, M.Office O where exists y in M.Age : y < 30

Lorel Query Features Explicitly handle coercion. Automatic type coercion 0.5 < “0.9” should return true Comparison on age transformed to existential condition. Since all properties are set-valued in OEM. A user can ask DBGroup.Member.Age < 30 regardless of whether Age is single valued, set valued, or unknown.

Path expression queries -specification for a set of possible paths through the graph –Example - * is a path expression that matches any number of labels Use of Data guides – Structural summary of the database

Lorel (cont.) General path expressions are loosely specified patterns for labels in the database. (‘|’ disjunction, ‘?’ label pattern optional) Example: –select DBGroup.Member.Name where DBGroup.Member.Office(.Room%|.Cubicle)? like “%252” Result: –Name “Jones” Name “Smith”

Lorel Queries - Simple Path Expression Retrieve the offices of members with age greater than 30 years: Query SELECT DBGroup.Member.Office WHERE DBGroup.Member.Age > 30 ResultOffice “Gates 252” Office Building “CIS” Room “411”

Queries - General Path Expression Query SELECT DBGroup.Member.Name WHERE DBGroup.Member.Office(.Room%|.Cubicle)? Like “%252” ResultName “Jones” Name “Smith” Room% matches all labels starting from Room, like Room68. “|” stands for disjunction. “?” indicates that the label pattern is optional. “like %252” specifies that the data value should end with string “252”.

Queries - SubQueries Retrieve Lore project members who work on other projects Query SELECT M.Name, ( SELECT M.Project.Title WHERE M.Project.Title != “Lore”) FROM DBGroup.Member M WHERE M.Project.Title = “Lore” ResultMember Name “Jones” Title “Tsimmis”

Lore - Summary Lore does facilitate query and updates on semi- structural databases There has been more work done on optimization using: data guides (vldb97). How is this related to WWW? XML-QL and related work provides the answer.