Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Design by Contract.
Database Design The process of finding user requirement
Conceptual Design using the Entity-Relationship Model
Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Database Management System Module 3:. Complex Constraints In this we specify complex integrity constraints included in SQL. It relates to integrity constraints.
 Schema mappings are logical assertions that describe the correspondence between two schemas Higher-level, declarative programming constructs Hide implementation.
1 541: Database Systems S. Muthu Muthukrishnan. 2 Overview of Database Design  Conceptual design: (ER Model is used at this stage.)  What are the entities.
The Entity-Relationship (ER) Model
The Entity-Relationship Model
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
1 The Entity-Relationship Model Chapter 2. 2 Overview of Database Design  Conceptual design: (ER Model is used at this stage.) –What are the entities.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Keys For XML Peter Buneman Susan Davidson Wenfei Fan Carmem Hara Wang Chiew Tan.
The Entity-Relationship (ER) Model CS541 Computer Science Department Rutgers University.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
Modeling Your Data Chapter 2. Part II Discussion of the Model: Good Design/ Bad Design?
International User Group Information Delivery Manuals: General Overview Courtesy:This presentation is based on material provided by AEC3 and AEC Infosystems.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
The Entity-Relationship Model. 421B: Database Systems - ER Model 2 Overview of Database Design q Conceptual Design -- A first model of the real world.
Data Modeling Using the Entity-Relationship Model
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
1 The Entity-Relationship Model Chapter 2. 2 Overview of Database Design  Conceptual design : (ER Model is used at this stage.)  What are the entities.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Functional Dependencies
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
Data integration and transformation 3. Data Exchange Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 28/10-4/11/2009.
DatabaseIM ISU1 Fundamentals of Database Systems Chapter 5 The Relational Data Model.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj RM/SQL Lecture 1 ©Akhilesh Bajaj, 2000, 2002, 2003, All.
Christoph F. Eick: Designing E/R Diagrams 1 The Entity-Relationship Model Chapter 3+4.
UNIT 2.
Data integration and transformation 3. Data Exchange Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 28/10/2009.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan.
Initial Design of Entity Types for the COMPANY Database Schema Based on the requirements, we can identify four initial entity types in the COMPANY database:
LECTURE 1: Entity Relationship MODEL. Think before doing it! Like most of the software projects, you need to think before you do something. Before developing.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
09/03/2009Lipyeow Lim -- University of Hawaii at Manoa 1 ICS 321 Fall 2009 Introduction to Database Design Asst. Prof. Lipyeow Lim Information & Computer.
1 Conceptual Design using the Entity- Relationship Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Seminar 10: OODB Design (Self-Study)
The Entity-Relationship (ER) Model. Overview of db design Requirement analysis – Data to be stored – Applications to be built – Operations (most frequent)
CSC 411/511: DBMS Design 1 1 Dr. Nan WangCSC411_L2_ER Model 1 The Entity-Relationship Model (Chapter 2)
CHAPTER 2 : RELATIONAL DATA MODEL Prepared by : nbs.
Modeling Your Data Chapter 2 cs5421. Part II Discussion of the Model: Good Design/ Bad Design? cs5422.
LECTURE 1: Entity Relationship MODEL. Think before doing it! Like most of the software projects, you need to think before you do something. Before developing.
Mapping ER to Relational Model Each strong entity set becomes a table. Each weak entity set also becomes a table by adding primary key of owner entity.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Entity-Relationship Model Chapter 2.
COP Introduction to Database Structures
Logical Database Design and the Rational Model
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
DATA MODELS.
Computing Full Disjunctions
Modeling Your Data Chapter 2 cs542
Seminar 9: OODB Design (Self Study)
Nested Mappings: Schema Mapping Reloaded
Nested Mappings: Schema Mapping Reloaded
The Entity-Relationship Model
The Entity-Relationship Model
Presentation transcript:

Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of Toronto UC Santa Cruz Motivation Extensions Choosing Desired Mapping Interpretation with Muse-D Designing Nesting Semantics with Muse-G Muse Overview Schema mapping = relationship between a source database schema and a target database schema Designing a schema mapping is a fundamental problem in information integration Specifying a semantically correct schema mapping is usually a complex task Automatic tools can suggest potential mappings Ensuring mapping correctness still requires intricate manual work Few tools are available for helping a designer understand and design alternative mappings CompDB: Rcd Companies: Set of Company: Rcd cid cname location Projects: Set of Project: Rcd pid pname cid manager Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd pname manager Employees: Set of Employee: Rcd eid ename f1f1 f2f2 m 1 : for c in CompDB.Companies exists o in OrgDB.Orgs where c.cname=o.oname and o.Projects = SKProjs(c.cid,c.cname,c.location) m 2 : for c in CompDB.Companies, p in CompDB.Projects, e in CompDB.Employees satisfy p.cid=c.cid and e.eid=p.manager exists o in OrgDB.Orgs, p 1 in o.Projects, e 1 in OrgDB.Employees satisfy p 1.manager=e 1.eid where c.cname=o.oname and e.eid=e 1.eid and e.ename=e 1.ename and p.pname=p 1.pname and o.Projects = SKProjs( ) m 3 : for e in CompDB.Employees exists e 1 in OrgDB.Employees where e.eid = e 1.eid and e.ename=e 1.ename CompDB: Rcd Projects: Set of Project: Rcd pid pname manager tech-lead Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Projects: Set of Project: Rcd pname supervisor m a : for p in CompDB.Projects, e1 in CompDB.Employees, e2 in CompDB.Employees satisfy e1.eid=p.manager and e2.eid=p.tech-lead exists p1 in OrgDB.Projects where p.pname=p1.pname and (e1.ename=p1.supervisor or e2.ename=p1.supervisor) and (e1.contact=p1. or e2.contact=p1. ) Example source: Projects P1 DB e4 e5 Employees e4 John e5 Anna Choice values for supervisor and (the designer makes one selection for each attribute) Nesting semantics are expressed through grouping functions, which are defined for each nested set in the target schema A grouping function is a form of Skolem function, with atomic attributes as parameters Example grouping function from mapping m 2 SKProjs( ) : target Project records are grouped according to the values of all attributes of the Company, Project and Employee source records Example: Designing the grouping function for the target Projects set Suppose the set of possible arguments is S = {cid, cname, location} Muse-G probes every attribute in S At each probe, a small carefully chosen source instance is considered, from which two differentiating target instances are obtained: one includes the probed attribute in the grouping function (Scenario 1 below), and the other omits it (Scenario 2 below). Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(11,y) DB e4 IBM Projects:SK(12,y) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 Web e5 Employees e4 John e5 Anna y subset of {IBM,NY} Step 1: Probing on the cid attribute The designer chooses scenario 2 (excludes cid from the grouping function) Example source: Companies 11 IBM NY 14 SBC NY Projects P1 DB 11 e4 P4 WiFi 14 e6 Employees e4 John x234 e6 Kat x331 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,y) DB e4 SBC Projects:SK(SBC,y) WiFi e6 Employees e4 John e6 Kat Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 WiFi e6 SBC Projects:SK(y) DB e4 WiFi e6 Employees e4 John e6 Kat y subset of {NY} Step 2: Probing on the cname attribute The designer chooses scenario 1 (includes cname in the grouping function) Example source: Companies 11 IBM NY 13 IBM SF Projects P1 DB 11 e4 P2 Web 13 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,NY) DB e4 IBM Projects:SK(IBM,SF) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(IBM) DB e4 Web e5 Employees e4 John e5 Anna Step 3: Probing on the location attribute The designer chooses scenario 2 (excludes location from the grouping function) Ambiguous mapping: The mapping scenario on the left is ambiguous: it can be interpreted in several ways e.g. the project supervisor can be either the manager or the tech-lead In total, there are four alternative interpretations Key idea of Muse-D: provide an example source instance to illustrate the four interpretations in a compact way Target instance: Orgs: Projects: DB John Anna Muse-G can take advantage of constraints on the source schema (such as keys, and more generally, functional dependencies) The designer can refine the desired nesting semantics incrementally Muse is a mapping design wizard that uses data examples to help designers understand, design and refine schema mappings In Muse, the designer works with data examples rather than with complex specifications to understand the semantics of a mapping Muse uses real data examples whenever possible, otherwise it constructs synthetic examples Muse consists of two components: Muse-G (design of desired nesting semantics for mappings) and Muse-D (choosing the desired interpretation of ambiguous mappings) Conclusion: the desired grouping function for Projects is SK(cname)