LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Storing and Querying XML Data in Databases Anupama Soli
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998
1 COS 425: Database and Information Management Systems XML and information exchange.
Query Optimization for Semistructured Data Jason McHug, Jennifer Widom Stanford University - Rajendra S. Thapa.
Database Systems and XML David Wu CS 632 April 23, 2001.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Lore: A Database Management System for Semistructured Data.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.
4/20/2017.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
WORKING WITH XSLT AND XPATH
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
1 Semi-structured data Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Querying Structured Text in an XML Database By Xuemei Luo.
Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Database Systems Part VII: XML Querying Software School of Hunan University
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Lore: A Database Management System for Semistructured Data.
Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Semistructured Data. Semistructured data is data that has some structure, but it may be irregular and incomplete and does not necessarily conform to a.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Session 1 Module 1: Introduction to Data Integrity
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Jacob (Jack) Gryn - Presented November 28, Semi-Structured Data and XML.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
XML: Extensible Markup Language
More SQL: Complex Queries,
Chapter 12 Outline Overview of Object Database Concepts
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CSE591: Data Mining by H. Liu
New Perspectives on XML
Presentation transcript:

LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Outline Introduction What is Lore? History Lore’s Forensic Conclusion Questions Demo

Introduction Limitations faced by traditional Databases:  force all data to adhere to an explicitly specified schema  Data Elements may change  Structures may change along the execution path of an application  Head ache when it comes to decide on a fixed schema for irregular or unstable data

SemiStructured Data Widespread SemiStructured Data:  “Self-describing”  “Schemaless” Examples:  Data from the web Overall site structure may change often. It would be nice to be able to query a web site.  Data integrated from multiple, heterogeneous data sources. Information sources change, or new sources added.

What is Lore? Lore is a DBMS designed specifically for managing semistructured information, such as XML Among the Pioneers in this domain

History Built, from scratch, by the DB Group at Stanford University, with research funding from DARPA, NASA and others. Introduced in 1995, with the first version of the query language called Lorel, and used OEM as data model. A lightweight system, because it was designed for a single-user, read-only access changed to support XML

Lore’s Forensic Lore’s Data model Lore’s Query Language Lore’s General Architecture When XML gets into action

OEM (Object Exchange Model) Simple, self-describing, nested object model for semi structured data (XML???) Data in this model can be thought of as a labeled directed graph Vertices in graph are objects.  Each object has a unique object identifier (oid), such as &5.  Atomic objects have no outgoing edges and are types such as int, real, string, gif, etc.  All other objects that have outgoing edges are called complex objects.

OEM (Summary) An OEM object has:  Label: a character string, object aliases  OID: Object unique identifier  Type: Atomic (int, real, string), Complex  Value: If it is a complex object  list of OIDs If it is an atomic object  atomic value of type int, real, string…

OEM (Example)

Lorel (Lore’s Query Language) Lorel is an extension of OQL Lorel supports path expressions for traversing graph data A simple path expression is a name followed by a sequence of labels.  DBGroup.Member.Office: Set of objects that can be reached starting with the DBGroup object, following edges labels member and then office.

Lorel Range variables can be assigned to path expression Path expression are used directly in queries in an SQL style: select DBGroup.Member.Office where DBGroup.Member.Age > 30

Lorel Result: Office “Gates252” Office Building “CIS” Room “411”

Lorel ( Behind the scenes) Previous query rewritten to OQL style:  select O from DBGroup.Member M, M.Office O where exists y in M.Age : y > 30 Comparison on age transformed to existential condition:  A user can ask DBGroup.Member.Age < 30 regardless of whether Age is single valued, set valued, or unknown.

Lorel (More examples) select DBGroup.Member.Name where DBGroup.Member.Office(.Room%)? like “%252” Result: Name “Jones” Name “Smith” Update: update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" ) from DBGroup.Project P where P.Title = "Lore" or P.Title = "Tsimmis"

Lore’s General Architecture

Query and Update Processing External Data DataGuides

Query and Update Processing Queries Data Engine (A Set of OEM objects)

Query Plan Generator select O from DBGroup.Mem ber M, M.Office O where exists y in M.Age : y > 30

Query Iterators Use recursive iterator approach:  execution begins at top of query plan  each node in the plan requests a tuple at a time from its children and performs some operation on the tuple(s).  pass result tuples up to parent.

Tuples (Object Assignment) OA is a data structure containing slots for range variables with additional slots depending on the query. Each slot within an OA will holds the oid of a vertex on a path being considered by the query engine. We should end up at the end of a query with complete OAs

Query Operators The Scan operator returns all oids that are sub-objects of a given object following a specified path expression:  Scan (StartingOASlot, Path_expression, TargetOASlot)  For each oid in StartingOASlot, check to see if object satisfies path_expression and place oid into TargetOASlot. For each returned OA of the left child, the join operator calls exhaustively the right child until no more OA is returned

Query Operators (cont) The aggregation operator (Aggr) adds to the target slot the result of the aggregation. The Join, Project and Select are almost identical to their corresponding relational operators Other operators: CreateSet, GoupBy, ArithOp

Query Operators (Visualize the Words)

Query Optimizer Does only a few optimizations:  Push selection ops down query tree.  Eliminate/combine redundant query operators. Explores query plans that use indexes when possible.  Two kinds of indexes:  Lindex (link index): returns all parents OIDs of a given OID via a label, impl. as hashing.  Vindex (value index): returns all atomic objects of a label that satisfies a condition, impl. as B+-trees

Vindexes Because of non-strict typing system, have String Vindex, Real Vindex, and String-coerced-to-real Vindex. Separate B-Trees of each type are constructed for each label. Using Vindex for comparison  If type is string, do lookup in String Vindex  If can convert to real the do lookup in String- coerced-to-real Vindex.  If type is real or int, do almost the same thin

Vindexes (cont) Arg2 Arg1 StringRealInt String--String  realBoth  real RealString  real--Int  real intBoth  realInt  real--

Index Query plans If the user’s query contains a comparison between a path expression and a value + appropriate Vindex and Lindex exist  generate an index query plan Previous query: select O from DBGroup.Member M, M.Office O where exists y in M.Age : y > 30

Index Query plans (cont)

Update Query plans update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" ) from DBGroup.Project P where P.Title = "Lore" or P.Title = "Tsimmis"

External Data Enables retrieval of information from other data sources, transparent to the user. An external object in Lore is a “placeholder” for the external data and specifies how lore interacts with an external data source.

External Data During query processing Scan operator notifies the external data manager whenever an external object is encountered The spec for an external object includes:  Location of a wrapper program to fetch and convert data to OEM,  timeout interval  a set of arguments used to limit info fetched from external source.

DataGuides A DataGuide is a concise and accurate summary of the structure of an OEM database (stored as OEM database itself, kind of like the system catalog). Very Helpful:  No explicit database schema  difficult to formulate meaningful queries  Query processor may perform unnecessary work with no knowledge of the database structure.  What if a path expression doesn’t exist (waste). Each possible path expression is encoded once.

DataGuides (cont) DataGuides are dynamically generated and maintained over an existing database Can store statistics in DataGuide For example, the # of atomic objects of each type reachable by p.

DataGuides (example)

When XML gets into Action Little reminder:  Lore first proposal in 1995  XML new standard for data representation and data exchange over the WWW.  Public class XML_data extends Semi_structured_data  Lore among the pioneers to integrate XML in their DBMS architecture

From Semistructured Data to XML Data Model Query Language DataGuides

Changes in The Data Model Similar to an OEM, an XML element in Lore is a pair of EID: is a unique element identifier VALUE: is either an atomic string text or a complex value containing:  A String value: tag  XML tag  An ordered list of attribute-name/atomic-value  An ordered list of crosslink subelements of the form, reachable via IDREF or IDREFS  An ordered list of subelements of the form

Changes in The Data Model (cont) Comments are ignored When an XML document is mapped into this new data model, it can be seen as a directed labeled graph

Example

Query Language Extended path expression to distinguish between subelements and attributes, by using qualifiers:  DBGroup.Member.>Name  &6, use > to implicitly specify a subelement   “Smith”, to implicitly specify an attribute  DBGroup.Member.Name  &6 “Smith”, when or > qualifier is used, both attributes and subelements are matched

DataGuides Provide a DTD from which Lore builds the corresponding DataGuide Otherwise if no DTD is provided, a DataGuide is generated from the XML document Problems when updating:  With a DTD is provided, validity is assured  With no DTD, DataGuide is updated as the XML document is updated

Conclusion Lore was originally developed for OEM data model since 1995, XML was integrated later in 1999 Lore Provided a clear and robust solution for storing, querying, and updating semistructured data (XML came after) The Lore project was declared pretty much out of business in 2000 by The Stanford Database Group

Questions???????