Data Integration Helena Galhardas DEI IST (based on the slides of the course: CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives)

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 1: INTRODUCTION TO DATA INTEGRATION PRINCIPLES OF DATA INTEGRATION.
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
CSE 636 Data Integration Data Integration Approaches.
Advanced Databases Temporal Databases Dr Theodoros Manavis
Microsoft Access 4 Database Creation and Management.
Welcome to Florida International University Online J.O.B.S. Link Applicant Tutorial.
UDDI, Discovery and Web Services Registries. Introduction To facilitate e-commerce, companies needed a way to locate one another and exchange information.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Search Engines and Information Retrieval
Course Goals Introduce Terms Skills –Modern DBMS (SQL Server 2008) –SQL querying and data access –Stored procedures including parameters –Brief introduction.
Database Management An Introduction.
Copyright © 2003 Addison-Wesley Your name here. Copyright © 2003 Addison-Wesley Overview of Information Systems What is the Internet? Why are databases.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Data Integration Helena Galhardas DEI IST (based on the slides of the course: CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives)
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Knowledge Portals and Knowledge Management Tools
Professor Michael J. Losacco CIS 1110 – Using Computers Application Software Chapter 3.
Sales Force Automation and Automated Customer Service Centers
1 Introduction to databases concepts CCIS – IS department Level 4.
FPDS- NG Reports Overview December 16, Today’s Goals Provide an overview of the FPDS-NG reporting capability Demonstrate each of the reporting tools.
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
Search Engines and Information Retrieval Chapter 1.
Module Title? DBMS Introduction to Database Management System.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Database Business Enterprise Software Training Infrastructure Job Seeker Intelligence Job Computing Services Using the internet and computing resources.
Introduction: Databases and Database Users
Concepts of Database Management Seventh Edition Chapter 4 Keys and Relationship.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Setting Up an on-line Store Tutorial Using SmartStore.biz This Tutorial assumes you have downloaded the software from This Tutorial.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Using the Right Method to Collect Information IW233 Amanda Murphy.
LRI Université Paris-Sud ORSAY Nicolas Spyratos Philippe Rigaux.
1 ICOM 5016 – Introduction to Database System Project # 1 Dr. Manuel Rodriguez-Martinez Department of Electrical and Computer Engineering University of.
The Americans with Disabilities Act makes it illegal for employers to discriminate against qualified people with disabilities. Refer to
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Chapter7 TELECOMMUNICATIONS AND NETWORKS. Content e-Business Systems – Cross-Functional Enterprise Applications – Enterprise Application Integration –
Concepts of Database Management Seventh Edition Chapter 4 Keys and Relationship.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
By ILTAF MEHDI (MCS, MCSE, CCNA) 1 Remember: Examination is a chance not ability. 6/12/2016.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
17 years of videogames industry
Chapter 1: Introduction
Overview of MDM Site Hub
Book: Integrated business processes with ERP systems
Book: Integrated business processes with ERP systems
System And Application Software
Entity – Relationship Model
Managing the Digital Enterprise: A 5-Year Experiment in Open Courseware Meeta Yadav and Michael Rappa North Carolina State.
Serviceability Assessment Training
Presentation transcript:

Data Integration Helena Galhardas DEI IST (based on the slides of the course: CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives)

Agenda Overview of Data Integration

A Problem Often people build databases in isolation, then want to share their data  Different systems within an enterprise  Different information brokers on the Web  Scientific collaborators  Researchers who want to publish their data for others to use Even with normalization and the same needs, different people will arrive at different schemas Goal of data integration: tie together different sources, controlled by many people, under a common schema

Motivating example (1) FullServe: company that provides internet access to homes, but also sells products to support the home computing infrastructure (ex: modems, wireless routers, etc) FullServe is a predominantly American company and decided to acquire EuroCard, an European company that is mainly a credit card provider, but has recently started leveraging its customer base to enter the internet market

Motivating Example (2) FullServe databases: Employee Database FullTimeEmp(ssn, empId, firstName, middleName, lastName) Hire(empId, hireDate, recruiter) TempEmployees(ssn, hireStart, hireEnd, name, hourlyRate) Training Database Courses(courseID, name, instructor) Enrollments(courseID, empID, date) Services Database Services(packName, textDescription) Customers(name, id, zipCode, streetAdr, phone) Contracts(custID, packName, startDate) Sales Database Products(prodName, prodId) Sales(prodName, customerName, address) Resume Database Interview(interviewDate, name, recruiter, hireDecision, hireDate) CV(name, resume) HelpLine Database Calls(date, agent, custId, text, action)

Motivating Example (3) EuroCard databases: Employee Database Emp(ID, firstNameMiddleInitial, lastName) Hire(ID, hireDate, recruiter) CreditCard Database Customer(CustID, cardNum, expiration, currentBalance) CustDetail(CustID, name, address) Resume Database Interview(ID, date, location, recruiter) CV(name, resume) HelpLine Database Calls(date, agent, custId, description, followup)

Observations Why data resides in multiple DBs in a company, rather than in a single well-organized DB? 1. When companies go hrough internal restructuring, they do not always align their DBs 2. Most DBs are created by a group within the company with a specific need 1. Not all the information needs in the future are anticipated

Motivating Example (4) Some queries employees or managers in FullServe may want to pose: The Human Resources Department may want to be able to query for all of its employees whether in the US or in Europe  Require access to 2 databases in the American side and 1 in the European side There is a single customer support hot-line, where customers can call about any service or product they obtain from the company.When a representative is on the phone with a customer, it´s important to see the entire set of services the customer is getting from FullServe (internet service, credit card or products purchased). Furthermore, it is useful to know that the customer is a big spender on its credit card.  Require access to 2 databases in the US side and 1 in the European side.

Another example: searching for a new job (1)

Another example: searching for a new job (2) Each form (site) asks for a slighly different set of attributes (ex: keywords describing job, location and job category or employer and job type) Ideally, would like to have a single web site to pose our queries and have that site integrating data from all relevant sites in the Web,

Goal of data integration Offer uniform access to a set of data autonomous and heterogeneous data sources:  Querying disparate sources  Large number of sources  Heterogeneous data sources (different systems, diff. Schemas, some structured others unstructured)  Autonomous data sources: we may not have full access to the data or source may not be available all the time.

Why is it hard? Systems reasons: even with the same HW and all relational sources, the SQL supported is not always the same Logical reasons: different schemas (e.g. FullServe and EuroCard temporary employees), diff attributes (e.g., ID), diff attribute names for the same knowledge (e.g., text and action), diff. Representations of data (e.g. First-name, last name) also known as semantic heterogeneity Social and administrative reasons

Practical goals of data integration Ideally: data integration system access a set of data sources and automatically configures itself to correctly and efficiently answer queries over multiple sources Actually:  Build tools to reduce the effort required to integrate a set of data sources  Improve the ability of the system to answer queries in uncertain environments Trade-off:  user effort vs accuracy

Referências Draft of the book on “Principles of Data Integration” by AnHai Doan, Alon Halevy, Zachary Ives (in preparation). Slides of the course: CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives) T. Landers and R. Rosenberg. An overview of multibase. In Proceedings of the Second International Symoposium on Distributed Databases, pages 153–183. North Holland, Amsterdam, Gio Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, pages 38–49, March Daniela Florescu, Alon Levy, and Alberto Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27(3):59–74,September Alon Y. Halevy, Naveen Ashish, Dina Bitton, Michael J. Carey, Denise Draper, Jeff Pollock, Arnon Rosenthal, and Vishal Sikka. Enterprise information integration: successes, challenges and controversies. In SIGMOD Conference, pages 778–787, 2005.