CSE 636 Data Integration Overview Fall 2006. 2 What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 1: INTRODUCTION TO DATA INTEGRATION PRINCIPLES OF DATA INTEGRATION.
CSE 636 Data Integration Data Integration Approaches.
Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.
Corpus-based Schema Matching Jayant Madhavan Philip Bernstein AnHai Doan Alon Halevy Microsoft Research UIUC University of Washington.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Principles of Dataspace Systems Alon Halevy PODS June 26, 2006.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
Data Integration Rachel Pottinger and Liang Sun CSE 590ES January 24, 2000.
Chapter 13 The Data Warehouse
Methodology Conceptual Database Design
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Introduction to Building a BI Solution 권오주 OLAPForum
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Chapter 11 Databases.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
OnLine Analytical Processing (OLAP)
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Managing Knowledge in Business Intelligence Systems Dr. Jan Mrazek.
I. Khalil Ibrahim1 Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together Dr. Ismail Khalil Ibrahim
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Fall 2013, Databases, Exam 2 Questions for the second exam…
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Data Integration: Achievements and Perspectives in the Last Ten Years AiJing.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Integration Approaches
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Slide 1 © 2016, Lera Technologies. All Rights Reserved. SAP BO vs SPLUNK vs OBIEE By Lera Technologies.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
1 Corso di Architetture della Info A.A Carlo Batini I sistemi di Data Integration elementi architetturali.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Data Warehouse and OLAP
Database Architecture
Distributed Database Management System
Data Warehouse and OLAP
Presentation transcript:

CSE 636 Data Integration Overview Fall 2006

2 What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query, and eventually updates too) multiple (even 2 is a problem!) autonomous (not affect the behavior of sources) heterogeneous (different data models, schemas) structured (at least semistructured) data sources (not only databases)

3 Motivation Enterprise data integration; web-site construction. World-wide web: –comparison shopping (Netbot, Junglee) –portals integrating data from multiple sources –XML integration Science & culture –Medical genetics: integrating genomic data –Astrophysics: monitoring events in the sky –Environment: Puget Sound Regional Synthesis Model –Culture: uniform access to all the cultural databases produced by different countries.

4 Principle Dimensions of Data Integration Virtual vs. materialized architecture Access: query only or query&update? –problem similar to updating through views –need distributed transactional services. Mediated schema: yes or no? –Mediated schema requires schema integration and then query reformulation. –Without mediated schema, we lose some of the advantages of data integration.

5 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications OLAP / Decision Support Data Cubes / Data Mining ETL Tools (Extract-Transform-Load) Data Cleaning

6

7 Virtual Integration Architecture Leave the data in the sources When a query comes in: –Determine the relevant sources to the query –Break down the query into sub-queries for the sources –Get the answers from the sources, filter them if needed and combine them appropriately Data is fresh Otherwise known as On Demand Integration

8 Virtual Integration Architecture End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Design-Time Schema Mappings Schema Mappings Schema Mappings Sources can be: Relational DBs Excel Files Web Sites Web Services

9 Differences in: –Names in schema –Attribute grouping –Coverage of databases –Granularity and format of attributes Inventory Database B Authors ISBN FirstName LastName Books Title ISBN Price DiscountPrice Edition Inventory Database A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords Schema Mappings BookCategories ISBN Category CDCategories ASIN Category Artists ASIN ArtistName GroupName CDs Album ASIN Price DiscountPrice Studio

10 Issues for Schema Mappings Design-Time What formalisms to express them? How to create them? Can we discover them somehow? How do we use them? End Users   Applications Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Mappings Schema Mappings Schema Mappings

11 Mediator Virtual Integration Architecture Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Run-Time Reformulation Optimization Execution QueryResult Wrapper

12 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Reformulation Query User queries refer to the global schema Data is stored in the sources in a local schema Rewriting algorithms

13 Issues for Query Processing Reformulation Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author Publisher ItemID ItemType SuggestedPrice Categories Keywords SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ AND ItemType = ‘Books’

14 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Translation Reformulation Optimization Execution Query Wrapper Different query languages

15 Local Source A Issues for Query Processing Query Translation Global Schema Books Title ISBN Price DiscountPrice Edition SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’

16 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Data Translation Reformulation Optimization Execution Query Wrapper Different data models

17 Issues for Query Processing Data Translation On the Road -- by Jack Kerouac; Paperback Buy new : $10.86 Local Result A Global Schema Books Title ISBN Price DiscountPrice Edition TitleISBNPrice…… On the Road ……

18 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Execution Reformulation Optimization Execution Query Wrapper Access as many data sources as needed Duplicate/redundant and irrelevant data Limited query capabilities

19 Issues for Query Processing Limited Query Capabilities Global Schema Books Title ISBN Price DiscountPrice Edition Local Schema A BooksAndMusic Title Author ItemID ItemType SuggestedPrice SELECT ISBN, Price, DiscountPrice FROM Books WHERE Title = ‘on the road’ SELECT GreatPrice FROM DiscountBooks WHERE ISBN = ? Local Schema B DiscountBooks Title Edition ISBN GreatPrice SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ? SELECT ItemID, SuggestedPrice FROM BooksAndMusic WHERE Title = ‘on the road’ A B SELECT GreatPrice FROM DiscountBooks WHERE ISBN = 123 C ItemIDSuggestedPrice ItemIDSuggestedPrice D E GreatPrice 8.86 ISBNPriceDiscountPrice

20 Mediator Issues for Query Processing Data Source Data Source Global Schema Local Schema Local Schema Data Source Local Schema Query Answering Reformulation Optimization Execution QueryResult Wrapper Combine the results and further process them if needed Mainly union and merge Inconsistencies

21 Issues for Query Processing Query Answering (Union) ItemIDSuggestedPrice ISBNGreatPrice ISBNPrice

22 Issues for Query Processing Query Answering (Merge) ItemIDTitle 123On the Road ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key ISBNTitleEditionPrice 123On the Road2nd8.86 Primary Key Primary Key

23 Issues for Query Processing Query Answering (Inconsistencies) ItemIDTitleEdition 123On the Road1st ISBNEditionPrice 1232nd8.86 ISBNTitleEditionPrice 123On the Road8.86 Primary Key ISBNTitleEditionPrice 123On the Road???8.86 Primary Key Primary Key

24 Peer-Based Integration Peer 2 Peer 1 Peer 5 Peer 3 Peer 4 Query

25 Peer-Based Integration No need for a central mediated schema Peers serve as mediators for other peers A peer can be both a server and a client Semantic relationships are specified locally (between small sets of peers) Queries are posed using the peer’s schema Answers come from anywhere in the system This is not P2P file sharing. –Data has rich semantics

26 References Information integration –Maurizio Lenzerini –Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 –Invited Tutorial Data Integration: a Status Report –Alon Halevy –German Database Conference (BTW), 2003 –Invited Talk