Information Integration

Slides:



Advertisements
Similar presentations
1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
Advertisements

CSE 636 Data Integration Data Integration Approaches.
Information Integration Using Logical Views Jeffrey D. Ullman.
Data integration Chitta Baral Arizona State University.
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
THE OBJECT-ORIENTED DESIGN WORKFLOW Interfaces & Subsystems.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
1 Information Integration Mediators Semistructured Data Answering Queries Using Views.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
1 Where Is Database Research Headed? Jeffrey D. Ullman DASFAA March 26, 2003.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
1 Information Integration Mediators Warehousing Answering Queries Using Views.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Lowell 2003 Challenges Alon Y. Halevy University of Washington.
CSC2012 Database Technology & CSC2513 Database Systems.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
© 2007 by Prentice Hall 1 Introduction to databases.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Submitted by: Deepti Kundu Submitted to: Dr.T.Y.Lin
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
1 Information Integration Mediators Warehousing Answering Queries Using Views Slides are modified from Dr. Ullman’s notes.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
1 On-Line Analytic Processing Warehousing Data Cubes.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Where Is Database Research Headed?
Enterprise Service Bus
Architetture della Informazione Anno accademico Carlo Batini Methodologies for planning the evolution of data architectures 1.
Foreign Keys Local and Global Constraints Triggers
Data Warehouse.
On-Line Analytic Processing
Introduction to Database Systems, CS420
CSC 480 Software Engineering
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
MANAGING DATA RESOURCES
Enhance BI Applications and Simplify Development
Introduction to Data Warehousing
Model Based Mediation With Domain Maps ___________________________
INFORMATION INTEGRATION
Local-as-View Mediators
Object-Oriented Databases
Distributed Database Management Systems
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
CS561-Spring 2012 WPI, Mohamed eltabakh
Presentation transcript:

Information Integration Mediators Warehousing Answering Queries Using Views

Example Applications Enterprise Information Integration: making separate DB’s, all owned by one company, work together. Scientific DB’s, e.g., genome DB’s. Catalog integration: combining product information from all your suppliers.

Challenges Legacy databases : DB’s get used for many applications. You can’t change its structure for the sake of one application, because it will cause others to break. Incompatibilities : Two, supposedly similar databases, will mismatch in many ways.

Examples: Incompatibilities Lexical : addr in one DB is address in another. Value mismatches : is a “red” car the same color in each DB? Is 20 degrees Fahrenheit or Centigrade? Semantic : are “employees” in each database the same? What about consultants? Retirees? Contractors?

What Do You Do About It? Grubby, handwritten translation at each interface. Some research on automatic inference of relationships. Wrapper (aka “adapter”) translates incoming queries and outgoing answers.

Integration Architectures Federation : everybody talks directly to everyone else. Warehouse : Sources are translated from their local schema to a global schema and copied to a central DB. Mediator : Virtual warehouse --- turns a user query into a sequence of source queries.

Federations Wrapper Wrapper Wrapper Wrapper Wrapper Wrapper

Warehouse Diagram Warehouse Wrapper Wrapper Source 1 Source 2

A Mediator Mediator Wrapper Wrapper Source 1 Source 2 User query Result Mediator Query Result Wrapper Wrapper Source 1 Source 2

Two Mediation Approaches Global as View : Mediator processes queries into steps executed at sources. Local as View : Sources are defined in terms of global relations; mediator finds all ways to build query from views.

Example: Catalog Integration Suppose Dell wants to buy a bus and a disk that share the same protocol. Global schema: Buses(manf,model,protocol) Disks(manf,model,protocol) Local schemas: each bus or disk manufacturer has a (model,protocol) relation --- manf is implied.

Example: Global-as-View Mediator might start by querying each bus manufacturer for model-protocol pairs. The wrapper would turn them into triples by adding the manf component. Then, for each protocol returned, mediator queries disk manufacturers for disks with that protocol. Again, wrapper adds manf component.

Example: Local-as-View Sources’ capabilities are defined in terms of the global predicates. E.g.,Quantum’s disk database could be defined by QuantumView(M,P) = Disks(’Quantum’,M,P). Mediator discovers all combinations of a bus and disk “view,” equijoined on the protocol components.

A Harder LAV Case The mediator supports a par(c,p) relation (which doesn’t really exist, but can be queried). Sources can support views that are complex expressions of par. A logic is needed to work with queries and view definitions. Datalog is a good choice.

Example: Some Local Views Source 1 provides some parent facts. V1(c,p) <- par(c,p) Source 2, run by the “Society of Grandparents,” supports only grandparent facts. V2(c,g) <- par(c,p) AND par(p,g)

Example – (2) Query (great-grandparents): ggp(c,x) <- par(c,u) AND par(u,v) AND par(v,x) How can the sources provide solutions that provide all available answers?

Example – (3) Sol1(c,x) <- V1(c,u) AND V1(u,v) AND V1(u,x) Sol2(c,x) <- V1(c,u) AND V2(u,x) Sol3(c,x) <- V2(c,v) AND V1(v,x) No other queries involving the views can provide more ggp facts. Deep theory needed to explain.

Comparison: LAV Vs. GAV GAV is simpler to implement. Lets you control what the mediator does. LAV is more extensible. Add a new source simply by defining what it contributes as a view of the global schema. Can get some use from grandparent info., even if par(c,p) is the only mediator data.

Course Plug In the Spring 07-08, Alon Halevy (Google) is teaching CS345C Information Integration. It will cover this technology and many others.