Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS257 219 Spring 2009 Instructor: Dr. T.Y.Lin.

Slides:



Advertisements
Similar presentations
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Cs257 Summary By Swathi Vegesna 217. Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Overview Distributed vs. decentralized Why distributed databases
Cs257 Summary By:- Rupinder Singh 216.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Final Project Simplicial Complex ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220.
REPORT CS257 Aniket Mulye CLASS ID: 111 SJSU ID: PROF: DR.T.Y.LIN * Modification are done in gray and italics.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Introduction to Data Warehousing Enrico Franconi CS 636.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
1 Information Integration Mediators Warehousing Answering Queries Using Views.
Chapter 15.7 Buffer Management ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
On-Line Analytic Processing Chetan Meshram Class Id:221.
1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.
ConnectIO Overview and On-boarding IO.CONNECTSMART.COM.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the differences and similarities between relational and object-oriented.
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
Submitted by: Deepti Kundu Submitted to: Dr.T.Y.Lin
Distributed Databases Midterm review. Lectures covered Everything until (including) March 2 nd Everything until (including) March 2 nd Focus on distributed.
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
1 Information Integration Mediators Warehousing Answering Queries Using Views Slides are modified from Dr. Ullman’s notes.
DBMS2001Notes 10: Information Integration1 Principles of Database Management Systems 10: Information Integration Pekka Kilpeläinen University of Kuopio.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Wrappers in Mediator-Based Systems. Introduction Mediator Wrapper Source 1 Source 2 Query Result.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.
Chapter 111 Chapter 11 Information Integration Spring 2001 Prof. Sang Ho Lee School of Computing, Soongsil Univ.
Information Integration(cntd.)
Chapter 11 Information Integration
On-Line Analytic Processing
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
Information Integration Introduction (21.1)
Distributed Databases
Information Integration
INFO/CSE 100, Spring 2006 Fluency in Information Technology
Distributed Database Management Systems
CS561-Spring 2012 WPI, Mohamed eltabakh
Review #1 Intro stuff What is a database, 4 parts, 3 users, etc.
Presentation transcript:

Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin

Federations  The simplest architecture for integrating several DBs  One to one connections between all pairs of DBs  n DBs talk to each other, n(n-1) wrappers are needed  Good when communications between DBs are limited

Wrapper Wrapper : a software translates incoming queries and outgoing answers. In a result, it allows information sources to conform to some shared schema.

Data Warehouse  Sources are translated from their local schema to a global schema and copied to a central DB.  User transparent: user uses Data Warehouse just like an ordinary DB  User is not allowed to update Data Warehouse

Warehouse Diagram Warehouse Extractor Source 1Source 2 User query result Combiner

Example Construct a data warehouse from sources DB of 2 car dealers: Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…) Dealer-2’s schema: Auto(serial,model,color) Options(serial,option) Warehouse’s schema: AutoWhse(serialNo,model,color,autoTrans,dealer) Extractor --- Query to extract data from Dealer-1’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,autoTrans,’dealer1’ From Cars;

Example Extractor --- Query to extract data from Dealer-2’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’yes’,’dealer2’ FROM Autos,Options WHERE Autos.serial=Options.serial AND option=‘autoTrans’; INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’no’,’dealer2’ FROM Autos WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial AND option = ‘autoTrans’);

Construct Data Warehouse 1)Periodically reconstructed from the current data in the sources, once a night or at even longer intervals. Advantages: simple algorithms. Disadvantages: 1) need to shut down the warehouse; 2) data can become out of date. There are mainly 3 ways to constructing the data in the warehouse:

Construct Data Warehouse 2)Updated periodically based on the changes(i.e. each night) of the sources. Advantages: involve smaller amounts of data. (important when warehouse is large and needs to be modified in a short period) Disadvantages: 1) the process to calculate changes to the warehouse is complex. 2) data can become out of date.

Construct Data Warehouse 3) Changed immediately, in response to each change or a small set of changes at one or more of the sources. Advantages: data won’t become out of date. Disadvantages: requires too much communication, therefore, it is generally too expensive. (practical for warehouses whose underlying sources changes slowly.)

Mediators  Virtual warehouse, which supports a virtual view or a collection of views, that integrates several sources.  Mediator doesn’t store any data.  Mediators’ tasks: 1)receive user’s query, 2)send queries to wrappers, 3)combine results from wrappers, 4)send the final result to user.

A Mediator diagram Mediator Wrapper Source 1Source 2 User query Query Result

Example Same data sources as the example of data warehouse, the mediator Integrates the same two dealers’ source into a view with schema: AutoMed(serialNo,model,color,autoTrans,dealer) When the user have a query: SELECT sericalNo, model FROM AkutoMed Where color=‘red’

Example In this simple case, the mediator forwards the same query to each Of the two wrappers. Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …) SELECT serialNo,model FROM cars WHERE color = ‘red’; Wrapper2: Autos(serial,model,color); Options(serial,option) SELECT serial, model FROM Autos WHERE color=‘red’; The mediator needs to interprets serial into serialNo, and then returns the union of these sets of data to user.

Example There may be different options for the mediator to forward user query, for example, the user queries if there are a specific model&color car (i.e. “Gobi”, “blue”). The mediator decides 2 nd query is needed or not based on the result of 1 st query. That is, If dealer-1 has the specific car, the mediator doesn’t have to query dealer-2.