Information Integration Introduction (21.1)

Slides:



Advertisements
Similar presentations
21.1 Introduction to Information Integration CS257 Fan Yang.
Advertisements

©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
21.1 Introduction to Information Integration CS257 Fan Yang.
Introduction to Data bases concepts
Chapter 9 Database Management Discovering Computers Fundamental.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Copyright © by Curt Hill Database Introduction History Why we want to use them Other fun.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
DBMS2001Notes 10: Information Integration1 Principles of Database Management Systems 10: Information Integration Pekka Kilpeläinen University of Kuopio.
Course FAQ’s I do not have any knowledge on SQL concepts or Database Testing. Will this course helps me to get through all the concepts? What kind of.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Database Management Systems (DBMS)
1 TOPIC 6 DATABASE 6.1 Introduction to Database 6.2 Basic Concept of Database 6.3 Database Object DATABASE.
Summarization – CS 257 Chapter – 21 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin.
Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.
CS 325 Spring ‘09 Chapter 1 Goals:
Introduction to DBMS Purpose of Database Systems View of Data
Databases and the MVC Model
Introduction to Microsoft Access
CS4222 Principles of Database System
Chapter 1 : Introduction to Computers
Lecture 1 Introduction to Database
Learning Objectives Today we will Learn:
Chapter 1: Introduction
Database Introduction
Lecture 3 Overview of Database Systems
Chapter 9 Database Systems
Chapter 1: Introduction
Introduction To Database Systems
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
DATA FLOW DIAGRAM EXAMPLES
Introduction to Database Systems
SQL – Application Persistence Design Patterns
Introduction to Database Systems
Computerized and Manual Systems
Tools for Memory: Database Management Systems
9/22/2018.
 DATAABSTRACTION  INSTANCES& SCHEMAS  DATA MODELS.
Databases and Information Management
What is a Database and Why Use One?
Chapter 1 : Introduction to Computers
Introduction to Database Systems
System And Application Software
Chapter 2 Database Environment.
Data Base System Lecture : Database Environment
Lecture 1: Multi-tier Architecture Overview
Introduction to Database Systems
Database Management Systems
SQL This presentation will cover: View in database MySQL installation
Fundamentals of Databases
Introduction to Database Management Systems
Databases and Information Management
CSC 453 Database Technologies
Introduction to DBMS Purpose of Database Systems View of Data
Computer Science Projects Database Theory / Prototypes
Chapter 1: Introduction
Summit Nashville /3/2019 1:48 AM
Databases and the MVC Model
DATABASES WHAT IS A DATABASE?
Introduction to Databases
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
SQL – Application Persistence Design Patterns
Database Dr. Roueida Mohammed.
CSC 453 Database Technologies
Presentation transcript:

Information Integration Introduction (21.1) Ashish Sharma CS-257 ID:118

Why Information Integration ? Databases are created independently, even if they later need to work together. The use of databases evolves, so we can not design a database to support every possible future use. We will understand Information integration from an example of University Database.

University Database Earlier we had different databases for different functions like; Registrar Database for keeping data about courses and student grades for generating transcripts. Bursar Database for keeping data about the tuition payments by students. Human Resource Department Database for recording employees including those students with teaching assistantship jobs. Applications were build using these databases like generation of payroll checks, calculation of taxes and social security payments to government.

But these databases independently were of no use as a change in 1 database would not reflect in the other database which had to be performed manually. For e.g. we want to make sure that Registrar does not record grades of the student who did not pay the fees at Bursars office. Building a whole new database for the system again is a very expensive and time consuming process. In addition to paying for a very expensive software the University will have to run both the old and the new databases together for a long time to see that the new system works properly or not.

A Solution for this is to build a layer of abstraction, called middleware, on top of all legacy databases, without disturbing the original databases. Now we can query this middleware layer to retrieve or update data. Often this layer is defined by a collection of classes and queried in an Object oriented language. New applications can be written to access this layer for data, while the legacy applications continue to run using the legacy database.

The Heterogeneity Problem When we try to connect information sources that were developed independently, we invariably find that sources differ in many ways. Such sources are called Heterogeneous, and the problem of integrating them is referred to as the Heterogeneity Problem. There are different levels of heterogeneity viz. Communication Heterogeneity. Query-Language Heterogeneity. Schema Heterogeneity. Data type differences. Value Heterogeneity. Semantic Heterogeneity.

Communication Heterogeneity Today, it is common to allow access to your information using HTTP protocols. However, some dealers may not make their databases available on net, but instead accept remote accesses via anonymous FTP. Suppose there are 1000 dealers of Aardvark Automobile Co. out of which 900 use HTTP while the remaining 100 use FTP, so there might be problems of communication between the dealers databases.

Query Language Heterogeneity The manner in which we query or modify a dealer’s database may vary. For e.g. Some of the dealers may have different versions of database like some might use relational database some might not have relational database, or some of the dealers might be using SQL, some might be using Excel spreadsheets or some other database.

Schema Heterogeneity Even assuming that the dealers use a relational DBMS supporting SQL as the query language there might be still some heterogeneity at the highest level like schemas can differ. For e.g. one dealer might store cars in a single relation while the other dealer might use a schema in which options are separated out into a second relation.

Data type Diffrences Serial Numbers might be represented by a character strings of varying length at one source and fixed length at another. The fixed lengths could differ, and some sources might use integers rather than character strings.

Value Heterogeneity The same concept might be represented by different constants at different sources. The color Black might be represented by an integer code at one source, the string BLACK at another, and the code BL at a third.

Semantic Heterogeneity Terms might be given different interpretations at different sources. One dealer might include trucks in Cars relation. One dealer might distinguish station wagons from the minivans, while another doesn’t.

THANK YOU