CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 1: INTRODUCTION TO DATA INTEGRATION PRINCIPLES OF DATA INTEGRATION.
CSE 636 Data Integration Data Integration Approaches.
Corpus-based Schema Matching Jayant Madhavan Philip Bernstein AnHai Doan Alon Halevy Microsoft Research UIUC University of Washington.
A Next Wave of Challenges in the Junction of Information Management (esp. Integration) and the Web Yannis Papakonstantinou Associate Prof., CSE, UCSD.
Information Integration: A Status Report Alon Halevy University of Washington, Seattle IJCAI 2003.
Principles of Dataspace Systems Alon Halevy PODS June 26, 2006.
New England Database Society (NEDS) Friday, April 23, 2004 Volen 101, Brandeis University Sponsored by Sun Microsystems.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Chapter 3 Database Management
Crossing the Structure Chasm Alon Halevy University of Washington, Seattle UBC, January 15, 2004.
24/1/20081 Architecture of multiple databases integration.
P2P Information Interoperability & Decision Support Domain Application SEMANTIC INTEROP QUERY PROCESSING GIS INTEROP P2P ● Heterogeneous semantic ● Semantic.
DataSpaces: A New Abstraction for Data Management Alon Halevy* DASFAA, 2006 Singapore *Joint work with Mike Franklin and David Maier.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Dataspaces: Co-Existence with Heterogeneity Alon Halevy KR 2006.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
Methodology Conceptual Database Design
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Crossing the Structure Chasm Alon Halevy University of Washington, Seattle UCLA, April 15, 2004.
Business Intelligence
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
XP Information Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities.
ONLINE BOOKSTORE DATABASE CSC 8490 BY: Chaya Gaddamanugu
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Chapter 11 Databases.
ASP.NET Programming with C# and SQL Server First Edition
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Using SAS® Information Map Studio
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Chapter 14 Sharing Enterprise Data David M. Kroenke Database Processing © 2000 Prentice Hall.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Managing Knowledge in Business Intelligence Systems Dr. Jan Mrazek.
 Business Intelligence Anthony DeCerbo Meaghan Duffy Steve Smith Warren Scoville.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Overview of the SAS® Management Console
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Chapter 11 Using SAS ® Web Report Studio. Section 11.1 Overview of SAS Web Report Studio.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Data Integration: Achievements and Perspectives in the Last Ten Years AiJing.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
Data Integration Approaches
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
1 Corso di Architetture della Info A.A Carlo Batini I sistemi di Data Integration elementi architetturali.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Statistical Schema Matching across Web Query Interfaces
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse and OLAP
Chapter 13 The Data Warehouse
Yannis Papakonstantinou Associate Prof., CSE, UCSD
Data Warehouse and OLAP
Presentation transcript:

CSE 636 Data Integration Overview

2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications OLAP / Decision Support Data Cubes / Data Mining ETL Tools (Extract-Transform-Load) Data Cleaning

3 Virtual Integration Architecture Leave the data in the sources When a query comes in: –Determine the relevant sources to the query –Break down the query into sub-queries for the sources –Get the answers from the sources, filter them if needed and combine them appropriately Data is fresh Otherwise known as On Demand Integration

4 Virtual Integration Architecture End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Design-Time Schema Mappings Schema Mappings Schema Mappings Sources can be: Relational DBs Excel Files Web Sites Web Services

5 Differences in: –Names in schema –Attribute grouping –Coverage of databases –Granularity and format of attributes Inventory Database B Authors ISBN FirstName LastName Books Title ISBN Price DiscountPrice Edition Inventory Database A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords Schema Mappings BookCategories ISBN Category CDCategories ASIN Category Artists ASIN ArtistName GroupName CDs Album ASIN Price DiscountPrice Studio

6 Issues for Schema Mappings Design-Time What formalisms to express them? How to create them? Can we discover them somehow? How do we use them? End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Mappings Schema Mappings Schema Mappings

7 Mediator Virtual Integration Architecture Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Run-Time Reformulation Optimization Execution QueryResult Wrapper

8 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Reformulation Query User queries refer to the global schema Data is stored in the sources in a local schema Rewriting algorithms

9 Issues for Query Processing Reformulation Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ AND ItemType = ‘Books’

10 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Translation Reformulation Optimization Execution Query Wrapper Different query languages

11 Local Source A Issues for Query Processing Query Translation Global Schema Books Title ISBN Price DiscountPrice Edition SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’

12 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Data Translation Reformulation Optimization Execution Query Wrapper Different data models

13 Issues for Query Processing Data Translation On the Road -- by Jack Kerouac; Paperback Buy new : $10.86 Local Result A Global Schema Books Title ISBN Price DiscountPrice Edition TitleISBNPrice…… On the Road ……

14 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Execution Reformulation Optimization Execution Query Wrapper Access as many data sources as needed Duplicate/redundant and irrelevant data Limited query capabilities

15 Issues for Query Processing Limited Query Capabilities Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author ItemID ItemType SuggestedPrice SELECT ISBN, Price, DiscountPrice FROM Books WHERE Title = ‘on the road’ SELECT GreatPrice FROM DiscountBooks WHERE ISBN = ? Local Schema B DiscountBooks Title Edition ISBN GreatPrice SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ? SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ A B SELECT GreatPrice FROM DiscountBooks WHERE ISBN = 123 C ItemIDSuggestedPrice ItemIDSuggestedPrice D E GreatPrice 8.86 ISBNPriceDiscountPrice

16 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Answering Reformulation Optimization Execution QueryResult Wrapper Combine the results and further process them if needed Mainly union and merge Inconsistencies

17 Issues for Query Processing Query Answering (Union) ItemIDSuggestedPrice ISBNGreatPrice ISBNPrice

18 Issues for Query Processing Query Answering (Merge) ItemIDTitle 123On the Road ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key Primary Key

19 Issues for Query Processing Query Answering (Inconsistencies) ItemIDTitleEdition 123On the Road1st ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road8.86 Primary Key ISBNTitleEditionPrice 123On the Road???8.86 Primary Key Primary Key

20 Source Domain Web Domain End Users  Application Domain Community-Based Integration Community Domain Data Source Mediator Community Schema Developers  New Source Application New Application Web Forms & Reports Source Schema …   Web Service Web Service Fairly-dynamic environment New sources register over time and new applications queries are formulated  Allow developers to easily build applications based on the community schema  So that each other’s needs are accommodated   Allow source owners to easily and independently register their source Source Owners  Community Owner 

21 Peer-Based Integration Peer 2 Peer 1 Peer 5 Peer 3 Peer 4 Query

22 Peer-Based Integration No need for a central mediated schema Peers serve as mediators for other peers A peer can be both a server and a client Semantic relationships are specified locally (between small sets of peers) Queries are posed using the peer’s schema Answers come from anywhere in the system This is not P2P file sharing. –Data has rich semantics

23 References Information integration –Maurizio Lenzerini –Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 –Invited Tutorial Data Integration: a Status Report –Alon Halevy –German Database Conference (BTW), 2003 –Invited Talk