Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Object Databases Baochuan Lu. outline Concepts for Object Databases Object Database Standards, Languages, and Design Object-Relational and Extended-Relational.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
ODMG Standard: Object Model1 OBJECT-ORIENTED DATABASE SYSTEMS ODMG Standard: Object Model Susan D. Urban and Suzanne W. Dietrich Department of Computer.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Introduction to Databases Transparencies
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
Object Oriented Databases - Overview
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 11 Object and Object- Relational Databases.
EER vs. UML Terminology EER Diagram Entity Type Entity Attribute
Introduction to DBMS Purpose of Database Systems View of Data
Advanced Database CS-426 Week 2 – Logic Query Languages, Object Model.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
1 Introduction to databases concepts CCIS – IS department Level 4.
DBMS Lecture 9  Object Database Management Group –12 Rules for an OODBMS –Components of the ODMG standard  OODBMS Object Model Schema  OO Data Model.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Chapter 7 System models.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Integration of Spatial Information Sources Based on Source Description Framework Yoshiharu Ishikawa, Gihyong Ryu, and Hiroyuki Kitagawa University of Tsukuba.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
Software Engineering, 8th edition Chapter 8 1 Courtesy: ©Ian Somerville 2006 April 06 th, 2009 Lecture # 13 System models.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
The ODMG Standard for Object Databases
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.
Connecting Architecture Reconstruction Frameworks Ivan Bowman, Michael Godfrey, Ric Holt Software Architecture Group University of Waterloo CoSET ‘99 May.
Chapter – 8 Software Tools.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Chapter 12 Outline Overview of Object Database Concepts Object-Relational Features Object Database.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
1 ODMG: ODL and OQL This lecture is a short version of: Lecture 11 (on ODL) & Lecture 12 (on OQL) M. Akhtar Ali School of Informatics.
©Silberschatz, Korth and Sudarshan 1.1 Database System Concepts قواعد البيانات Data Base قواعد البيانات CCS 402 Mr. Nedal hayajneh E- mail
Introduction to DBMS Purpose of Database Systems View of Data
Databases and DBMSs Todd S. Bacastow January 2005.
Chapter 1 Introduction.
Data and Applications Security Developments and Directions
Object-Oriented Database Management System (ODBMS)
Object-Oriented Databases
Chapter 1 Introduction.
Chapter 12 Outline Overview of Object Database Concepts
Data, Databases, and DBMSs
Data Model.
Introduction to DBMS Purpose of Database Systems View of Data
Object Databases: Logical Data Modeling
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Presentation transcript:

Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University

Distributed Information Search COmponent (DISCO) The distributed mediator architecture of DISCO Query processing semantics Data models The interface to underlying data sources

Introduction Access to large number of data sources of heterogeneous distributed databases introduces new problems:  End users and application programmers Unavailable data sources To answer a query involving n databases, all n databases must be available, otherwise either no answer is returned, or some partial answer is returned The availability of answers in the system declines as the number of databases rises.

Introduction (Cont.) Access to large number of data sources of heterogeneous distributed databases introduces new problems:  Database Administrators (DBA) Incorporating new sources into the model Schemas must be changed Catalogs must be updated New definitions must be added

Introduction (Cont.) Access to large number of data sources of heterogeneous distributed databases introduces new problems:  Database Implementors (DBI) Translation of queries between query languages and schemas New codes must be written

DISCO Architecture A : Application M : Mediator C : Catalog W : Wrapper D : Data Source Arcs represent exchange of queries and answers

Applications (A) Written by application programmers Access a uniform representation of the underlying sources through a uniform query language

Mediators (M) Permit collection of databases to be accessed in a uniform way Accept queries and transform them into sub-queries Keep state of summary information about its associated databases

Catalogs (C) Special mediators Keep track of collection of databases, wrappers, and mediators Overview of the entire system

Wrappers (W) Deal with the heterogeneous nature of databases Transform sub-queries Maps from the general query language, used by mediators, to the source query language Reform answer (data) appropriate to each mediator

Features of DISCO For application Programmers  Provides a new semantic for query processing to ease dealing with unavailable data sources

Features of DISCO For DBA  Models data sources as objects which permits powerful modeling capability  Supports type transformations to ease the incorporation of new data sources into a mediator

Features of DISCO For DBI  Provides flexible wrapper interface to ease the construction of wrappers

Data Model person (type) person (extent) person0person1person2 r0 Mary 200 r2 Select x.name From x in person Where x.salary > 10 The answer is: bag (“Mary”, “Sam”) of Bag type. (Programmer viewpoint) The same query would access the third data source as well. (DBA viewpoint) The model supports dissimilar structures r1 Sam 150

Wrapper Interface DISCO provides a flexible wrapper interface for DBI.  The interface to wrappers is at the level of an abstract algebraic machine (AM) of logical operators. DBI implements the logical operators and a call in the wrapper interface which returns the grammar. During the query processing, mediator generates a logical expression. Mediator call interface to get the grammar and checks the logical expression matches the grammar Mediator Wrapper Interface (Algebraic Machine)

Mediator Data Model Extensions to the ODMG standard ODMG (Object Data Management Group)  Object Data Model  Object Definition Language (ODL)  Object Query language (OQL)  Language binding

Mediator Data Model Extensions to the ODMG standard  Object Data Model interface  defines a type signature for an object extent  automatically maintain the collection of objects of the interface, i.e. an extent is a name variable whose value is the collection of all objects of the associated interface. When objects are created or destroyed, the extent is updated automatically.

Mediator Data Model Extensions to the ODMG standard  Object Definition Language (ODL) wrapper  models wrappers repository  the address of a database or some other type of repository, contain several data sources. Each data source in a repository is associated with an extent

Mediator Data Model Extensions to the ODMG standard  Define access to a data source 1.Create an instance of the repository type: r0 := Repository (host = “rodin.inria.fr”, name = “db”, address = “ ”) 2.Locate the wrapper (written by a database implementor): w0 := WrapperPostgres ( ); 3. Define the interface (type) in the mediator which corresponds to the data source object, e.g. Person type corresponds to the objects in data sources r0 and r1: interface Person { attribute String name; attribute Short salary; } 4. Specify the extent of this mediator type which access the r0 utilizing the w0 wrapper. extent person0 of Person wrapper w0 repository r0; Each DISCO extent represents a collection of data in one data source

Mediator Data Model Extensions to the ODMG standard  Data access from the data source The query select x.name from x in person0 where x.salary > 10 returns the answer Bag(“Mary”) Addition of a new extent of Person type: extent person1 of Person wrapper w0 repository r1; To access objects in both data sources the query: select x.name from x in union (person0, person1) where x.salary > 10 returns the answer Bag(“Mary”, “Sam”) Advantage: refer to the extents explicitly Disadvantage: difficult to express queries, when the extents are not explicitly specified

Mediator Data Model Extensions to the ODMG standard Solution: MetaExtent keeps details the extents of all the mediator types. General format of MetaExtent type that is created automatically: interface MetaExtent (extent metaextent) { attribute String name; attribute Extent e; attribute Type interface; attribute Wrapper wrapper; attribute Repository repository; attribute Map map; } Query definition expression of the extent person: interface Person (extent person) { attribute String name; attribute Short salary; } Thus, the query dynamically accesses all the extents defined for the type Person define person as Flatten( select x.e from x in metaextent where x.interface = Person)

Mediator Data Model Matching similar and dissimilar structures or substructures DBA defines the aggregation of data from data sources  access to multiple data sources: Matching similar substructures  subtype Matching similar structures  map Matching dissimilar structures  view

Mediator Data Model Matching similar substructures Subtyping  ODMG standard Example: The Student interface as a subtype of Person and two extents are defined by DBA as follows: interface Student: Person { } extent student0 of Student wrapper w0 repository r2 extent student1 of Student wrapper w0 repository r3 The person extent still contains person0, and person1. It does not automatically reference the extents of its subtypes, in the subtype hierarchy. DISCO Solution: special syntax  person*

Mediator Data Model Matching similar structures Mapping Example: interface PersonPrime { attribute String n; attribute Short s; } extent personprime0 of PersonPrime wrapper w0 repository r0; Since objects returned from r0 are of type Person, the extent personprime0 has a type conflict with objects returned. To avoid a run-time error DISCO allows the DBA to resolve this type conflict.

Mediator Data Model Matching similar structures Mapping example (Cont.): The type conflict is resolved by specifying a mapping between a mediator type and a data source type. The mapping function is called the local transformation map. extent personprime0 of PersonPrime wrapper w0 repository r0 map ((person0=personprime0), (name = n), (salary = s)); extent personprime0 of PersonPrime wrapper w0 repository r0;

Mediator Data Model Matching dissimilar structures View in DISCO Example: interface PersonTwo { attribute String name; attribute Short regular; attribute Short consult; } extent persontwo0 of PersonTwo wrapper w0 repository r5; View definition to aggregate over the data sources: define personnew as bag (select struct (name : x.name, salary : x.salary) from x in person, select struct (name : x.name, salary : x.regular + x. consult) from x in persontwo0) A view can reference other views but are not updatable

Mediator Query Processing

Query Processing With Unavailable Data There are three possibilities if a data source does not respond: 1- System waits 2- System assumes the unavailable source do not exist or the source is considered to have no matching tuples 3- System returns a partial answer DISCO uses partial evaluation semantics to queries, by processing as much of the query as possible, from the information that is available. Thus, the answer to a query may be another query.

Assume r0 does not respond: select x.name from x in person where x.salary > 10 Query Processing With Unavailable Data (Cont.)

Assume r0 does not respond: select x.name from x in person where x.salary > 10 union (select y.name from y in person0 where y.salary > 10, Bag(“sam”))

Query Processing With Unavailable Data (Cont.) Assume r0 does not respond: select x.name from x in person where x.salary > 10 union (select y.name from y in person0 where y.salary > 10, Bag(“sam”)) partial answer (query) partial answer (data)

Conclusion The design of DISCO provides some solutions to some of the problems encountered by the scaling the number of data sources in heterogeneous distributed databases.  Partial evaluation query semantics  AP  Data modeling tools  DBA  Flexible wrapper interface  DBI