Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.

Slides:



Advertisements
Similar presentations
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
Advertisements

Transaction.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Introduction to Database Management
Automatic Data Ramon Lawrence University of Manitoba
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
EC Calling Cards. Calling cards are a relatively recent phenomenon. The premise is fairly simple: an entrepreneur buys a large block of billable service.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
4/20/2017.
Advanced Database CS-426 Week 2 – Logic Query Languages, Object Model.
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
© The McGraw-Hill Companies, 2006 Chapter 1 The first step.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
High-Level Programming Languages: C++
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
On-Line Analytic Processing Chetan Meshram Class Id:221.
2440: 141 Web Site Administration Database Management Using SQL Professor: Enoch E. Damson.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
M1G Introduction to Database Development 6. Building Applications.
CS 474 Database Design and Application Terminology Jan 11, 2000.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
ITGS Databases.
SQL Basic. What is SQL? SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database.
1 Information Integration Mediators Warehousing Answering Queries Using Views Slides are modified from Dr. Ullman’s notes.
DBMS2001Notes 10: Information Integration1 Principles of Database Management Systems 10: Information Integration Pekka Kilpeläinen University of Kuopio.
Organizing Data and Information
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Fall 2007cs4201 Advanced Java Programming Umar Kalim Dept. of Communication Systems Engineering
Utilizing Databases to Manage Precision Ag Data Candice Johnson BAE 4213 Spring 2004.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Database Management Systems (DBMS)
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Chapter 111 Chapter 11 Information Integration Spring 2001 Prof. Sang Ho Lee School of Computing, Soongsil Univ.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
This shows the user interface and the SQL Select for a situation with two criteria in an AND relationship.
On-Line Analytic Processing
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
the Need for Data Integration
The Top 10 Reasons Why Federated Can’t Succeed
February 7th – Exam Review
CS 440 Database Management Systems
C.U.SHAH COLLEGE OF ENG. & TECH.
Information Integration Introduction (21.1)
Variable Length Data and Records
Flat Files & Relational Databases
ICOM 5016 – Introduction to Database Systems
Information Integration
CS561-Spring 2012 WPI, Mohamed eltabakh
Presentation transcript:

Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118

Agenda Information Integration Problems of Information Integration 1. Data type differences 2. Value differences 3. Semantic Differences 4. Missing Values Modes of Information Integration 1. Federated Databases 2. Warehousing 3. Mediation

Information Integration In simple terms “Information Integration” can be defined as the process of taking data that is stored in two or more databases (Information Sources) and building one large database from them. The sources may be conventional databases or collection of web pages.

Motivation The main motivation behind Information Integration is to bring Information from Disparate sources, possible from sources with varied schemas so that data can be queried as a single unit. End user is oblivious to the contextual and typographical differences in the sources of data.

Example Consider the example of an automobile company with 1000 dealers. Each dealer maintains a database of their cars in stock. The company wants to create an integrated database which contains information from all the 1000 dealers. The integrated database can be useful in performing market analysis and determining market trends to adjust production.

Example cont.. However, the 1000 dealers do not all use the same database schema. Dealer 1 may store information in a single relation. While dealer 2 might use a schema where options are separated into a second relation.

Example Cont.. Dealer 1 Cars(sNo, model, color, aTran, cdPlr, ….) Dealer 2 Autos(serial, model, color) Options(serial, option) Not only is the schema different, but equivalent names have also changed

Problems of Information Integration Data Type differences Serial number might be represented by character string at one source, while other source might use integers. Even if two sources use character string to store serial number, one may use fixed length while other may use variable length string.

Problems of Information Integration Value Differences The same concept might be represented by different constants at different sources For e.g. The color black might be represented by code BL at one source, while BL may stand for color blue at other source.

Problems of Information Integration Semantic Differences Terms may be given different interpretations at different source. One dealer might include trucks in Cars relation while other dealer includes only car information in Cars relation.

Problems of Information Integration Missing Values A source might not record information of a type that all or most of the other sources provide. For e.g. a dealer might not record colors at all.

Federated Database Systems Simplest method of Information Integration One to One connection between all pairs of databases that need to talk to one another. These connections allow one database system D1 to query another D2 in terms that D2 can understand.

A federated collection of four databases

Considering the schemas from slide 7,this is how Dealer 1 queries Dealer 2 for needed cars

Problem with federated database Systems Large number of pieces of code must be written to provide communication between databases that need to talk. If n databases want to talk with each other, then the total number of connections required is n (n-1)

Data Warehouses Data from several sources is extracted and combined into a global schema. The Integrated data is then stored at warehouse, which looks like an ordinary database to the user. User can issue queries to warehouse exactly in the same manner in which he would to an ordinary database. Updates to the warehouse are generally forbidden.

Data warehouse Data warehouse stores integrated information in a separate database.

Three approaches to constructing the data in the Warehouse Periodic reconstruction Incremental Updates Warehouse changed immediately in response to changes at the source.

Data warehouse Periodic reconstruction Most common approach. Warehouse reconstructed periodically from the current data at source. Disadvantages Warehouse must be shut down for reconstruction Data in the warehouse can become seriously outdated.

Data warehouse Incremental Updates Warehouse updated periodically based on the changes that have been made to source. Involves smaller amounts of data. Useful in situations where warehouse needs to be updated in a short period of time and the warehouse is large. Disadvantage Complex compared to periodic reconstruction.

Data warehouse Warehouse changed immediately in response to changes at the source. Involves too much communication and processing. Difficult to implement for large warehouses. Practical only in situations where the underlying source changes very slowly.

Extractor for Data warehouse Considering the schemas on slide 7 this is how we can extract data from dealer 1

Mediators Mediator supports a virtual view or collection of views. Integrates several sources in the same way as a Warehouse does. The major difference is that Mediator does not store any data.

Mediators A mediator translates queries into the terms of the source and combines the answers

Mediators No separate combiner is required as Mediator performs the function of combining the results. Mediator is also responsible for determining where the queries should be directed.