Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University.

Slides:



Advertisements
Similar presentations
PHP I.
Advertisements

Configuration management
Chapter 10: Designing Databases
Database System Concepts and Architecture
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
Xyleme A Dynamic Warehouse for XML Data of the Web.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998
1 COS 425: Database and Information Management Systems XML and information exchange.
Query Optimization for Semistructured Data Jason McHug, Jennifer Widom Stanford University - Rajendra S. Thapa.
Physical Database Monitoring and Tuning the Operational System.
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Lore: A Database Management System for Semistructured Data.
Chapter 11 ASP.NET JavaScript, Third Edition. 2 Objectives Learn about client/server architecture Study server-side scripting Create ASP.NET applications.
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Objectives of the Lecture :
This chapter is extracted from Sommerville’s slides. Text book chapter
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Database Management 9. course. Execution of queries.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Chapter 18 Object Database Management Systems. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for object.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Union-find Algorithm Presented by Michael Cassarino.
Lore: A Database Management System for Semistructured Data.
Methodology – Physical Database Design for Relational Databases.
Chapter 4: SQL Complex Queries Complex Queries Views Views Modification of the Database Modification of the Database Joined Relations Joined Relations.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Session 1 Module 1: Introduction to Data Integrity
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
CIS 250 Advanced Computer Applications Database Management Systems.
Object storage and object interoperability
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Web Database Programming Using PHP
Querying and Transforming XML Data
Database Management System
Web Database Programming Using PHP
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Presentation transcript:

Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University Mustafa Emre ERKOÇ

What is Lore (Lightweight Object Repository): A DBMS designed specifically for managing semi structured data. In the paper: Storage manager Indexing Query Processing and Optimization User Interfaces is described. Needs for such a DBMS: 1)The data can be irregular and not conform to a rigid schema. 2)There can be changes in the structure of the schema such as: data elements can change types, data not conforming previous schema can be edit and other schema modifications What Lore Does: since it is designed for managing semi structured data, there is no existence of schema. Lore data model is very simple, self describing and nested object model called OEM(Object Exchange Model)

Outline of Presentation : Representing and Querying Semi Structured Model Data Model Lorel Query Language System Architecture Query and Update Processing Query Operators and Query Plan Query Indexing Lindex and Vindex Index Query Plan Update Query Plan Physical Storage Novel Features External Data Manager Data Guides Interfaces API Web System Status and Future Work

What has done so far? The Lorel query language for semi structured data. 11/96 Enabling query formulation and optimization in semi structured database. 08/97 Representing and querying changes in semi structured data. 02/97 Querying semi structured heterogeneous information.12/96 Query optimization for semi structured data. Integrating dynamically-fetched external information into a dbms for semi structured data. 05/97

Representing and querying  The object exchange model Oem is designed for semistructured data Data can be thought as labeled directed graph The vertices are objects and names are special labels serve as aliases for objects and as entry points into database Every object has a unique object identifier (oid) Example : &5 Atomic Object: object that has no outgoing edge. Contain a value from one of the basic atomic types such as integer, real,string, gif etc Example: Object &7 with value Clark Complex Objects: all other objects that may have subobjects. Example: &3 and subobjects &8 &9 &10 and &11 Any object that cannot be accessed by a path from some name is considered to be deleted. 1)Members have zero, one or more offices 2)An office is sometimes string and sometimes integer and sometimes complex object 3)Room may be a string and integer

Representing and querying semistuctured data  Lorel query language Lorel is an extension of OQL. Basic building block of lorel is the simple path expression which is a name followed by a sequence of labels. Ex : DBGruop.Member.Office Ex: select DBGruop.Member.Office Where DBGruop.Member.Age>30 Result is office “Gates 252” Office Building ”CIS” Room “411” This query will not yield a run-time error if an Age object has a string value or is complex or They are singled-value, set-value or even empty for some group members. The OQL style of this query is: Select O From DBGruop.Member M, M.Office O Where exits A in M.Age : A>30 As will be seen later the first step of query processing is to rewriting the query into OQL style

Representing and querying semistuctured data  lorel query language One can specifypattern for path pattern for labels pattern for atomic values. Example: Select DBGroup.Member.Name Where DBGroup.Member.Office(.Room%|.Cubicle)? like”%252” Result Name “Jones” Name “ Smith” The following query illustrates subqueries and constructed results. Select N.Name, (select M.Project.Title where M.Project.Title !=“lore”) From DBGroup.Member M Where M.Project.Title = “lore” Result: Name “ Jones”Title “ Tsimmis”

System Architecture Access to Lore is through a variety of applications or directly from application Program Interface (API ) There is a simple textual interface generally used by developers. The graphical interface is the primary interface for the end users. It provides powerful tools for query results, formulating simple queries and mechanism for viewing the multimedia atomic types The query compilation layer is consist of -The parser -Preprocessor - Query plan generator-Query optimizer. Query optimizer, in addition to doing some transformations on the query plan, decides whether the use of indexes is feasible. The Data Engine Layer houses - OEM object manager - Query operators -External data manager- Various utilities. the object manager functions as the translation layer between OEM and the low level file constructs. It supports : Fetching an object, comparing two objects, performing simple coercions, cache of frequently accessed objects

System Architecture

Query and Update Processing in Lore Query processing in Lorel is fairly conventional with some exceptions: Becuase of the flexibility of Lorel, the preprocessing of the parse tree to produce OQL like query is complex. Although the Lore engine is built around standard operators, some take an original flavor Ex: scan operator. A unique feature of Lore is its automatic coercion of atomic values. Coercion has an impact on the implementation of comparators. The result of a Lorel query is always a set of OEM object which become subobject of a newly created result object.

Query and Update Processing in Lore  Query processing Query execution strategy is based on familiar database operators. A recursive iterator approach is used in query processing. With the iterators execution begins at the top of the query plan,with each node in the plan requesting a tuple from its children and performing some operations on the tuples After a nodes complete its operation, it passes a resulting tuple up to its parent. The tuples on which the operation takes places is Object Assignments. OA is a simple Data structure containing slots corresponding to range variables in the query If OA1holds the oid for member “Smith” then OA2 and Oa3 can hold the oid’s For one of the Smith’s Office subobject and one of the Smith’s Age subobject respectively

Query and Update Processing in Lore  Query operators Join, project, select operators are nearly identical to their corresponding relational operators. Aggregation operator calls its child exhaustively, storing the result temporarily or computing the aggregate incrementally. When the child can produce no more valid OA’s, a new object is created whose value the final aggregation. Other operators : SetOp, ArithOp, CreateSet, and Gruopby. SetOp handles the set operations; Union, intersect and except. ArithOp handles arithmatic operators as; addition, multiplication etc. CreateSet is used to package the result of an arbitrary subquery before proceeding. Gruopby handles queries that include a groupby expression.

Example: select M.Name, count (M.Publication) from DBGroup.Member M Where M/Dept = ‘CS’ OQL like translation is select (select N from M.Name N), count (select P from M.Publication) from DB.Group.Member M where exist D in M.Dept : D = ‘CS’

Query and update processing in Lore  Query Optimization & Indexing Query processor currently implement only a few simple heuristic optimization technique E.g Selection operators are pushed down the query tree, and in some cases redundant operator are eliminated or combined Lore explored query plans that use indexes when feasible. In a relational DBMS, an index is created on an attribute in order to locate tuples with particular attribute values quickly. In Lore, such a value index, alone is not sufficient since path to an object is as important as the value of object. So there are two types of indexes: Link (Edge) Index :also called Lindex, takes an oid and a label, and returns the oid’s of All parents via specified label. It essentially provides “parent pointers” Value Index: also called Vindex, takes a label, operator and value. It returns all atomic objects having an incoming edge with specified label and a value satisfying the specified operator and value.

Indexing  Value Indexing Value indexing in Lore requires some novel features due to its non-strict typing system. Indexing system deals with coercion involving integers, reals, and strings only. But the situation is simplified by always coercing integers to reals as shown in the table. Three types of Vindexes are maintained: String vindex Real vindex String coerced to real vindex When using a Vindex for comparison there are two cases: 1- If the comparison type string then: i-do a look up in the string vindex ii- if the Value can be coerced to a real then look up fir the coerced value in the real vindex 2- If the value is real or iteger: i-do a look up in the real vindex; ii- also look up in the String coerced to real vindex

Indexing  Index query plan If the query contains a comparison between a path expression and an integer, real or string and if the appropriate indexes are exist then a query plan that uses indexes will be created Ex: Select O From DBGruop.Member M, M.Office O Where exits A in M.Age : A>30 Query plans using indexes are different in the shape from those based on Scan operators index plans traverse the database bottom-up, while scan based plans perform a top-down traversal.

Indexing  Index query plan The new operators are: Vindex: iteratively finds all atomic objects with the value bigger than 30 and an incoming edge labeled Age, placing their oids in slot OA2 Lindex: iteratively places into OA1 all parents of the object in OA2 via an Age edge. Once: the object could potentially have several partners via Age, however; since Age is existentially quantified in the query, we only want to consider each parent once, even if the it has several Age subobject; this is why we use once here. Name_Obj: in the second lindex, operator finds all parents of OA1 object via member edge and place them in OA0. The objects in AO0 should be named as DBGroup and the purpose of Name_Obj operator is to do this.

Query and Update processing in Lore  Update query plan Update is done by adding an Update operator to the query execution engine. Ex: update P.Member += (select DBGroup.Member where DBGroup.Member.Name=‘Clark’) from DBGroup.Project p where P.Title = ‘Lore’ or P.Title = ‘Tsimmis’ This update operator add an edge between the projects whose titles are Lore and Tsimmis, and members called Clark. Here we use Create_edge operation. The other valid operations are: Destroy_edge, modify_atomic

Query and Update Processing in Lore  Bulk Loading and Physical Storage Bulk Loading Data can be added to a Lore database in two ways: 1)By using update statement 2)By load file In the second case, a textual description of an OEM database is accepted by a loas utility which includes useful features such as symbolic references for shared subobject, as well as the ability to incorporate new data into an existing database. Physical Structure Lore arranges objects in a physical disk pages; each page has a number of slots with a single object in each slot. Object are variable length and so Lore places object according to a first-fit algorithm. Lore use an object forwarding mechanism to handle object that grow too large objects that may span page. In addition, Lore supports for large objects that span many pages.

Novel Features  External Data External Data Manager enables dynamic retrieval of information from other data sources based on the queries issued to Lore. During the query evaluation, externally obtained data is combined with resident data and so The distinction between two type of data is invisible to end user. During the query processing, when the execution engine discover an external object information is fetched from external source to answer query. Also the fetched information is stored in database until it becomes stale.

Novel Features  External Data Specifications for an external objects: i-The location of wrapper program that fetches the external data and translate it into OEM ii-A quantum that indicates that indicates the time interval until the fetched information become stale. iii-A set of arguments that are used to limit the information fetched in a call to the external source Arguments sent to the external source can come from 3 places: i-the query being processed (query defined) ii-values of other objects in the local database(data defined) iii-constant values tied to external object(hard-coded)

Novel Features  External Data Many calls to an external sources quickly dominate query processing time. To limit the number of calls some mechanism is used. One of them is to track the argument sets used by previous queries and determine when previously fetched information or entirely subsumes information required by the current argument set

Novel Features  Data Guides Lore does not have an explicit schema so query formulation and optimization are particularly challenging. A dataguide is a concise and accurate structure of an OEM database, stored as an OEM object. Each possible path expression of a database is encoded exactly once in the dataguides. Considering the role of missing schema, the Dataguides can guide the query processor In relational or O 2 systems the schema is explicitly created before any data is loaded, However, dataguides are dynamically generated or maintained.

Interfaces to Lore  Application Programming Interface API provides a gateway between Lore and any user interface or client application. For instance, it is used by the system’s textual interface, which passes user commands to Lore and present result of queries in a hierarchical display. API is composed of a small collection of C++ classes. At the highest level it is used : -as a client program to connect to a Lore database. -To submit queries -To process query result. LoreConnection class is to connect a specific database Submit function is to submit Lorel queries. It is also used for other Lore command such as index creation and updates. When submit is called with a query, it returns query results as a LoreOem object. LoreOem only contains oid and the actual value is fetched from database on demand.

User Interfaces to Lore  Web interface By visiting an url and choosing a database, a user can connect to web interface. The user may submit a textual Lorel query or select a sample prewritten query. Result are displayed in an HTML, hierarchical format.

System status and Future work Lore is consist of approximately lines of C++ codes. Some language features such as external predicates and functions are still under implementation. General path expression are not implemented in their full generality, although a substantial and very useful is. It is planned to explore how Lorel could be translated to SQL 3 and thus implemented on top of an object relational dbms. Performance Issues There is a little performance analysis are done for Lore. Comparing the performance of lore against the implemented lorel on top of O 2 There is a significant additional research to do in query optimization, including query writing, operation ordering, selecting the best use of indexes in query plans. Currently all expansions of path expressions in query path are done at run-time. It is planned to explore the compile-time approach and compare its performance against the Run-time approach.

System Status and Future Work It is also considering to investigate more sophisticated techniques for customizing the presentation of OEM object in a Web environment. Finally, it is planned to incorporate a special text type along with a full-text indexing System because many applications appropriate for a DMBS such as Lore, include a Significant amount of text data.