CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:

Slides:



Advertisements
Similar presentations
1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
Advertisements

XML: Extensible Markup Language
PPT Slides by Dr. Craig Tyran & Kraig Pencil The editor in charge of business books for Prentice Hall, I have traveled the length and breadth of.
CMSC 104, Section 301, Fall Lecture 01, 8/28/02 CMSC 104 Course Information Instructor: Dr. Li-Chuan Chen Emai: Work Phone:
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 IS380 Class Agenda 01/11/05 Sock H. Chung 1.Syllabus 2.Chapter 1 3.Introduction 4. Request.
CS6003 Database Systems (10 credits) Lecturers: Adrian O’Riordan (term 1), Dr. Kieran Herley (term 2) Term 1 Contact: is office.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Introduction to Database Systems CSE 444 Lecture #1 January 5th, 1998.
1 ICS 223: Transaction Processing and Distributed Data Management Winter 2008 Professor Sharad Mehrotra Information and Computer Science University of.
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
Managing XML and Semistructured Data Lecture 1: Preliminaries and Overview Prof. Dan Suciu Spring 2001.
Rundensteiner-CS34311 CS3431 – Database Systems I Logistics Instructor: Elke A. Rundensteiner
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
1 Introduction to Database Systems CSE 444 Lecture #1 January 5, 2004 Alon Halevy.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
1 Introduction to Database Systems CSE 444 Lecture #1 March 31, 2008.
1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.
1 Introduction to Database Systems CSE 444 Lecture #1 September 27, 2006.
1 Database Systems Lecture #1. 2 Staff Instructor: Tova Milo – –Schreiber, Room 314, –Office hours: See.
CSE544 Introduction Monday, March 27, Staff Instructor: Dan Suciu –CSE 662, –Office hours: Wednesdays, 12pm-1pm TA: Bhushan.
1 Introduction to Database Systems CSE 444 Lecture #1 September 28, 2005.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Computer Network Fundamentals CNT4007C
Page 1 Course Description CPS510 Database Systems Fall 2004 School of Computer Science Ryerson University.
CSC2012 Database Technology & CSC2513 Database Systems.
Computer Networks CEN 5501C Spring, 2008 Ye Xia (Pronounced as “Yeh Siah”)
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) Introduction James Wang.
1 Data Integration. 2 Motivating Examples An organization has on average 49 databases –can talk about the same topic, but use different vocabularies,
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 4:15 pm – 5:30.
Christoph F. Eick Introduction Data Management Today 1. Introduction to Databases 2. Questionnaire 3. Course Information 4. Grading and Other Things.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: M,T,W,Th,F 2:30 pm – 3:30 pm,
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150DW Course Overview Instructor: Dan Hebert.
CS4432: Database Systems II Course Logistics 1. Textbook 2 Required “Database Systems: The Complete Book”, Second Edition Hector Garcia-Molina, Jeffrey.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 2:30 pm – 3:30.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
CEN First Lecture CEN 4010 Introduction to Software Engineering Instructor: Masoud Sadjadi
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Course Overview for Compilers J. H. Wang Sep. 20, 2011.
IB 105 Environmental Biology MWF 11-11:50 2 hand outs: course syllabus and pre-test.
1 Advanced Database System Design Instructor: Ruoming Jin Fall 2010.
Rundensteiner-CS34311 CS3431 – Database Systems I Logistics Instructor: Elke A. Rundensteiner
CSE3330/5330 DATABASE SYSTEMS AND FILE STRUCTURES (DB I) CSE3330/5330 DB I, Summer2012 Department of Computer Science and Engineering, University of Texas.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
1 Introduction to Database Systems CSE 444 Lecture #1 September 26, 2007.
Advanced Databases COMP3017 Dr Nicholas Gibbins
CSE202 : Fundamentals of Database Systems Vikram Goyal Indraprastha Institute of Information Technology, Delhi (IIIT-D), India FROM : Slides from CSE202.
Introduction to CSCI 242 Compiled by S. Zhang 1. Syllabus Syllabus has the most updated information! –Use the information on the syllabus for the grading.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
CS3431-B111 CS3431 – Database Systems I Logistics Instructor: Mohamed Eltabakh
Introduction to Database Systems CSE 444 Lecture #1 September,
Introduction to Database Systems CSE 444
Database Systems Lecture #1.
Data Warehouse and OLAP
Database Applications Programming CS 362
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Database Architecture
Introduction to Database Systems CSE 444
Query Optimization.
Database Applications Programming CS 362
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Data Warehouse and OLAP
Presentation transcript:

CSE 636 Data Integration Introduction

2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours: Wednesday & Friday 1:00-2:00pm & By Appointment Web Page Newsgroup sunyab.cse.636

3 Course Goals Data integration applications and architectures Issues in building such applications –Really big and currently active research area Solutions to several of them Provide foundation for –understanding current research problems –criticizing proposed solutions –proposing your own solution! Acquire valuable experience by implementing the project

4 Prerequisites An introductory database course –CSE 520, CSE 562 or equivalent Data structures and algorithms Knowledge Representation Distributed systems Complexity theory Mathematical Logic Curiosity! –You should ask a lot of questions Have a lot of fun!

5 Relevant Material Textbooks Database Systems: The Complete Book –by Garcia-Molina, Ullman and Widom Database Management Systems –by Ramakrishnan Fundamentals of Database Systems –by Elmasri and Navathe Foundations of Databases –by Abiteboul, Hull and Vianu Data on the Web –by Abiteboul, Buneman and Suciu

6 Course Format Assignments: 15% –Three assignments will be given, 5% each Final: 20% (take home) Projects: 60% –Detailed specs will be given –Can be used to satisfy the M.S. project requirement Participation: 5%

7 What is Data Integration? The problem of providing uniform (sources transparent to users) access to (query) multiple (even 2 is a problem) autonomous (not affect the behavior of sources) heterogeneous (different data models, schemas) structured (at least semistructured) data sources (not only databases)

8 The Data Integration Problem MyBookstore.com Mediated Schema DB BooksInventoryOrdersShippingReviews Site Morgan Kaufman Addison Wesley Prentice Hall East West DB Orders Site FedEx UPS DB Customer Reviews Site NY Times DB Intranet Site … WS Site Internet WS Internet Uniform query capability across autonomous, heterogeneous data sources on the Internet

9 Motivation Enterprise data integration –Web site construction WWW –Comparison shopping –Portals integrating data from multiple sources –B2B, electronic marketplaces Sciences –Geology: integrate geological data across the US continent (text as well as spatial data) –Biology: integrating genomic data

10 Current Solutions Mostly ad-hoc programming –Create a special solution for every case –Pay consultants a lot of money Data Warehousing (Data Exchange) –Load all the data periodically into a warehouse –Separates operational DBMS from decision support DBMS (not only a solution to data integration) –Performance is good –Data may not be fresh –Need to clean data

11 Course Outline (Tentative) Data Integration Scenarios & Architectures –Find out what the problems are Data Models & Type Systems –XML/Semistructured Data, DTDs, XML Schema Query & Transformation Languages –Datalog, XPath, XQuery, XSLT Data Integration Approaches –Different approaches depending on application characteristics Schema Integration –Schema Mapping/Matching –Semi-automate the discovery of schema mappings

12 Course Outline (cont) Distributed Query Processing Algorithms Query Rewriting Algorithms Limited Query Capabilities –We don’t have full access to any database Consistent Query Answers Web Services –What can they do for data integration? Semantic Web –RDF & SPARQL Workflow Languages –How is this related to data integration?

13 References Data Integration: a Status Report –Alon Halevy –German Database Conference (BTW), 2003 –Invited Talk Lecture Slides –Alon Halevy – ectures/ps/l12.pshttp:// ectures/ps/l12.ps