Entity Modelling Tips.

Slides:



Advertisements
Similar presentations
BACS 485—Database Management Advanced SQL Overview Advanced DDL, DML, and DCL Commands.
Advertisements

Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
12-1 Copyright  Oracle Corporation, All rights reserved. What Is a View? EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
Restricting and sorting data 16 May May May Created By Pantharee Sawasdimongkol.
1Eyad Alshareef Enhanced Guide to Oracle 10g Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
SQL: Part 4 Original materials supplied by the Oracle Academic Initiative (OAI). Edited for classroom use by Professor Laku Chidambaram. Not for commercial.
Session 3: SQL (B): Parts 3 & 4 Original materials supplied by the Oracle Academic Initiative (OAI). Edited for classroom use by Professor Laku Chidambaram.
OO - Lecture 4 Tutorial Review Associations Inheritance of Functions Polymorphism.
Logical Operators Operator AND OR NOT Meaning Returns TRUE if both component conditions are TRUE Returns TRUE if either component condition is TRUE Returns.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Dr. Philip Cannata 1 Doug Tolbert Unisys Doug Tolbert has been involved with OMG since the early 1990s. He is.
CS370 Spring 2007 CS 370 Database Systems Lecture 7 The Relational model.
Week 6 Lecture Normalization
o At the end of this lesson, you will be able to:  Describe the life-cycle development phases  Discuss the theoretical and physical aspects of a relational.
Copyright  Oracle Corporation, All rights reserved. I Introduction.
Dr. Philip Cannata 1 Programming Languages Prolog Part 3 SQL & Prolog.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Cursors These slides are licensed under.
Subqueries.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
SQL- DQL (Oracle Version). 2 SELECT Statement Syntax SELECT [DISTINCT] column_list FROM table_list [WHERE conditional expression] [GROUP BY column_list]
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic SQL These slides are licensed under.
Session 2: SQL (A): Parts 1 and 2 Original materials supplied by the Oracle Academic Initiative (OAI). Edited for classroom use by Professor Laku Chidambaram.
Database Design Sections 11 Database relationship, Integrity, keys, mapping conceptual model to logical/physical model Previous Section 12 – DDLesson11Fa12.ppt.
I-1 Copyright س Oracle Corporation, All rights reserved. Data Retrieval.
Copyright  Oracle Corporation, All rights reserved. 12 Creating Views.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
An Introduction To SQL Part 2 (Special thanks to Geoff Leese)
1 Information Retrieval and Use (IRU) An Introduction To SQL Part 2.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Relational State Assertions These slides.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Oracle CONNECT BY function JAVA WEB Programming. Emp 테이블의 내용 ( 상 / 하급자 계층구조 ) SQL> select * from emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
2-1 Limiting Rows Using a Selection “…retrieve all employees in department 10” EMP EMPNO ENAME JOB... DEPTNO 7839KINGPRESIDENT BLAKEMANAGER CLARKMANAGER.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Copyright س Oracle Corporation, All rights reserved. 12 Creating Views.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Communicating with a RDBMS Using SQL Database SQL> SELECT loc 2 FROM dept; SQL> SELECT loc 2 FROM dept; SQL statement is entered Statement is sent to database.
Databases and DBMSs Todd S. Bacastow January
Logical Database Design and the Rational Model
Relational Normalization Theory
Databases Chapter 9 Asfia Rahman.
Including Constraints
Information Systems Today: Managing in the Digital World
Chapter 5: Logical Database Design and the Relational Model
Subqueries.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Database Systems Instructor Name: Lecture-12.
CIS 336 Competitive Success/snaptutorial.com
Manipulating Data Schedule: Timing Topic 40 minutes Lecture
Writing Correlated Subqueries
COS 346 Day 8.
What Is a View? EMPNO ENAME JOB EMP Table EMPVU10 View
Database Fundamentals
(SQL) Manipulating Data
Chapter 5 Advanced Data Modeling
Flat Files & Relational Databases
Lecture 16 : The Relational Data Model
Subqueries Schedule: Timing Topic 25 minutes Lecture
Restricting and Sorting Data
Lecture 16 : The Relational Data Model
Subqueries Schedule: Timing Topic 25 minutes Lecture
Copyright © Ellis Cohen
Database Programming Using Oracle 11g
Presentation transcript:

Entity Modelling Tips

Quick Intro Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director ANSI/ISO Standards Committee - SQL Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Director Member of IOUC Day job – Tradba Ltd

What is an Entity Oxford English Dictionary Entity : Noun.1 Thing Bloom’s Taxonomy Thing : Whatever is or maybe an object of thought James Martin’s definitions of IT terms Object : A real-world entity

Relationships and the Time Dimension Building Society Scenario Manager Branch Each and every manager manages one and only one branch Each and every branch is managed by one and only one manager Could (should?) be implemented as one entity (table) But what about the time dimension? How could the relationship ‘marriage’ be modelled?

Why Normalise? Normalising usually implies splitting data apart into separate tables This means data often has to be joined back together Why not keep the data in one table? Attempt 1 Store department details in line with employee information in emp_dept emp_dept EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ----- ------ --------- ---- --------- ---- ---- ------ ---------- -------- 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 ACCOUNTING NEW YORK 7839 KING PRESIDENT 17-NOV-81 5000 10 ACCOUNTING NEW YORK 7934 MILLER CLERK 7782 23-JAN-82 1300 10 ACCOUNTING NEW YORK 7566 JONES MANAGER 7839 02-APR-81 2975 20 RESEARCH DALLAS 7902 FORD ANALYST 7566 03-DEC-81 3000 20 RESEARCH DALLAS 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 RESEARCH DALLAS 7369 SMITH CLERK 7902 17-DEC-80 800 20 RESEARCH DALLAS 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 RESEARCH DALLAS 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 SALES CHICAGO 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 SALES CHICAGO 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 SALES CHICAGO 7900 JAMES CLERK 7698 03-DEC-81 950 30 SALES CHICAGO 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 SALES CHICAGO 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 SALES CHICAGO

The emp_dept table Primary key is empno 1. No way to add a new department without an employee 2. Changing a department name or location would require multiple rows to be updated Risk of inconsistencies 3. Deleting the last employee in a department would remove the department information

Attempt 2 – The dept_emp table Store employee details in line with departmental information Requires an indeterminate number of columns Eight columns for each employee in the department Key is deptno Cannot add a new employee without a department Could easily have duplicate employees Would need to know that TURNER is employee number 2 in department 30 Suppose one department was much bigger than the rest Leads to lots of columns with most rows having many null values DEPTNO DNAME LOC EMPNO1 ENAME1 JOB1 MGR1 HIREDATE1 SAL1 COMM1 EMPNO2 ... ------ ---------- -------- ------ --------- --------- ---- --------- ----- ----- ------ 10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 7839 09-JUN-81 2450 7839 ... 20 RESEARCH DALLAS 7566 JONES MANAGER 7939 02-APR-81 2975 7902 ... 30 SALES CHICAGO 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 7844 ...

Many to Many Relationships EMP PROJECT Cannot be directly implemented as two tables Number of foreign key columns is indeterminate Similar to problems shown on previous slide Link employees to projects by inserting foreign empno columns into project table If a project has 3000 employees it would require 3000 empno columns One project may have many more employees than others Results in many null values for most rows Link projects to employees by inserting foreign proj_id columns into emp table If an employee has 50 projects it would require 50 empno columns One employee may be on many more projects than others

Many to Many Relationships (continued) Many columns – possibly many NULLs PROJ_ID P_DESC EMPNO1 EMPNO2 EMPNO3 EMPNO4 EMPNO5 EMPNO6 EMPNO7 EMPNO8 EMPNO8 ... ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 1 ARTS 7566 7900 7942 7369 7654 7902 7876 7482 7321 ... 2 DESIGN 7600 7942 3 IT 7988 7600 7788 : : : : : Cannot add an employee to a project if all columns are full Difficult to handle change of EMPNO EMPNO ENAME PROJ_ID1 PROJ_ID2 PROJ_ID3 PROJ_ID4 PROJ_ID5 PROJ_ID6 PROJ_ID7 ... ----- ------ -------- -------- -------- -------- -------- -------- -------- ------- 7399 COX 3 5 9 8 2 12 14 ... 7942 MILLS 1 7001 CARVER 4 3 : : : : Cannot add a project to an employee if all columns are full Difficult to handle change of PROJ_ID

Many to Many Relationships (continued) EMP PROJ_EMP PROJECT EMPNO ENAME ----- ------ 7399 COX 7942 MILLS 7001 CARVER : : PROJ_ID EMPNO ------- ------ 1 7942 1 7566 1 7900 1 7369 1 7654 1 7902 1 7876 1 7482 1 7321 2 7600 2 7942 3 7988 3 7600 3 7788 : : PROJ_ID P_DESC ------- ------ 1 ARTS 2 DESIGN 3 IT : : The link entity , PROJ_EMP, has a primary key of PROJ_ID,EMPNO and foreign keys PROJ_ID (referencing PROJECT) and EMPNO (referencing EMP) Easy to make changes to data

Fan Traps Arise from M:1 and 1:M relationships through one entity Subsidiary Company Department Employee Diagram give rise to three tables : Department Company Employee D# C# D1 C1 D2 C2 D3 C# C1 C2 E# C# E1 C1 E2 E3 E4 C2 E5 E6 E7 E8

Data Patterns in Fan Traps Green is employee number E7 In which department does Green work? The possibilities fan out into D1, D2 and D3 from C2 E1 E2 E3 E4 E5 E6 E7 E8 D1 D2 D3 C1 C2

Significant and non-Significant Fan Traps Fixing the model is easy The hierarchy is changed Subsidiary Company Department Employee

Non-Significant Fan Traps A very similar construct, but gives no problem Warehouse Stock Sales Office There is no specific relationship between Stock and Sales Office Business rules are that a Sales Office can receive ANY of the Stock from its Warehouse The fan trap is non-significant

Resolving Relationships Key requirement Which jockey was riding which horse in which race? Horse Jockey Race

Resolving Relationships (continued) Race Horse R1 H1 R1 H2 R3 H1 R2 H3 R4 H3 Jockey Horse J1 H1 J1 H2 J3 H1 J3 H3 Horse Horse/Jockey Horse H1 H2 H3 Race/Horse Race Jockey Race Jockey R1 J1 R1 J3 R3 J1 R2 J3 R4 J3 Jockey J1 J2 J3 Race R1 R2 R3 R4 Race/Jockey

Chasm Traps Hospital Example – existing observed data pattern Hospital Clinic Doctor Leeds Eye Nose Throat Black Orange Yellow Brown Green Blue Red Pink Gives rise to the following relationships Hospital Clinic Doctor

Chasm Traps (continued) 1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 Doctors can be tracked through to their hospital

Chasm Traps (continued) But another doctor called White (doctor number 500) is discovered He works at Leeds but is not attached to a specific clinic H# 1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 500 ?? Gives rise to a chasm trap – there is no route through from doctor to hospital via clinic The situation is now better represented as : This shows a potential chasm trap Hospital Clinic Doctor

Chasm Traps (continued) Model fixed by adding a relationship between doctor and hospital Hospital Clinic Doctor Results in an extra foreign key column in the doctor table H# 1 Leeds 2 Halifax C# H# 10 1 20 2 30 40 D# C# H# 100 20 2 200 10 1 300 400 500 ??

Recursive Relationships An employee may manage one or more other employees An employee may be managed by one and only one manager Emp EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ------ --------- ---- --------- ---- ---- ------ 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7839 KING PRESIDENT 17-NOV-81 5000 10 7934 MILLER CLERK 7782 23-JAN-82 1300 10 7566 JONES MANAGER 7839 02-APR-81 2975 20 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 7369 SMITH CLERK 7902 17-DEC-80 800 20 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7900 JAMES CLERK 7698 03-DEC-81 950 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30

Arcs B A C D Each instance of A is related to one instance of either B or C or D Cannot take part in more than one of the relationships Allows different physical designs Known as EXCLUSIVE-OR relationship

Arcs – Alternative Table Designs Bid Cid Did x Rel id B X D C x or Table A1 has a column for each foreign key Leads to lots of Null values Need complex integrity checking to avoid more than one foreign key column having a non-null value in each row New foreign key columns needed if extra relationships are added to the arc Table A2 has only one column of foreign key values plus a column to identify which parent table is related to the row No null values – easier integrity checking More complex joins (need to equate two columns (rel and id)) Greater flexibility if new relationships added No need to add columns to A2 Keys in B,C and D must be of the same datatype

Arcs – More Alternative Table Designs A1 could be separated out into three tables One for each relationship Advantages No need to scan the full A table when looking for rows in A of a particular type Disadvantages No easy way of viewing all rows of type A Need to union A1, A2 and A3 Effectively this is a form of subtyping (of A) A1 B A2 C A3 D

Arcs – Even More Alternative Table Designs B,C and D could be regarded as similar entities if their data is similar Combine them into one table (Z) Effectively collapsing subtypes into their supertype Advantages No need for separate foreign key columns in the A table Simple joins Disadvantages Harder to distinguish between rows in A of type B, C and D Bigger data sets to process B, C and D must have keys of same datatype Z B A C D

Cyclic Relationships Key : Sta Key : Sta,D State District Street City Key : Sta,D,C,Str Key : Sta,D,C Leads to redundant data

Cyclic Relationships (continued) Primary key of street table Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Additional foreign key from direct relationship with state Inconsistencies can arise due to redundant relationship Foreign key via state, district, and city

Cyclic Relationships (continued) Inconsistency easy to spot The two Sta# columns in Street are not equal But what if surrogate keys are used? Sta# D# Sta1 D1 D2 D3 Sta2 Sta3 District Sta# Sta1 Sta2 Sta3 Sta4 State Street City Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Sta# D# C# Sta1 D3 C2 D2 C1 Sta2 C4 C3 Sta3 D1

Surrogate Keys Not easy to see that key value Sur_C5 has an inconsistency with Sta4 in the Sta# column The surrogate key Sur_C5 represents C1, D1, Sta3 Street City Sur_C# Str# Sta# Sur_C1 Str2 Sta2 Sur_C2 Str4 Sta4 Sur_C3 Sur_C5 Str3 Sta3 Sur_C# Sta# D# C# Sur_C1 Sta2 D1 C2 Sur_C2 Sta4 D2 C1 Sur_C3 D3 C4 Sur_C4 Sta3 C3 Sur_C5 Surrogate key – unique for all cities

Sub-Typing Methods Consider the following diagram Has a supertype consisting of two subtypes Hourly-paid employees and salaried employees EMP SAL HOURLY How could this be physically implemented?

Alternative 1 One table Either fill in salary field or hourly field – but not both Lots of NULLs (either Sal_code or Hourly_rate must be NULL) EMP Empno Last_name First_name Sal_code Hourly_rate 1 SMITH TONY AX 2 WILSON DOUG 12.00 3 WARD AMY 15.00 4 ATKINS PAUL B2 5 MILLS GEORGE B5 Advantage Easy design Disadvantage Wasted space Need to write triggers to preserve integrity

Alternative 2 Two tables One for hourly paid employees, one for salaried employees Empno Last_name First_name Sal_code 1 SMITH TONY AX 4 ATKINS PAUL B2 5 MILLS GEORGE B5 EMP_SAL Empno Last_name First_name Hourly_rate 2 WILSON DOUG 12.00 3 WARD AMY 15.00 EMP_HOUR Advantage Easy design (no NULLs) Disadvantage No easy way to work with all employees Best used when applications do not use both tables

Alternative 3 Three tables One for common attributes (supertype) and two subtypes Empno Last_name First_name 1 SMITH TONY 2 WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE PK(empno) EMP Empno Hourly_rate 2 12.00 3 15.00 Empno Sal_code 1 AX 4 B2 5 B5 EMP_HOUR EMP_SAL Advantage Collection of common attributes Disadvantage No easy way to know if named employee is salaried or hourly paid

Alternative 4 Four tables Same as alternative 3 but with EMP_TYPE to allow more generic constructs Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE EMP_TYPE E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Advantage Easy to write applications against Disadvantage E_code is redundant data in EMP

Alternative 4 Additional tables needed if extra types introduced EMP Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE 6 C LORD PETER E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Empno Commission 6 4450.00 EMP_COMM

Alternative 5 Five Tables – Generic solution Advantage Easy to modify structure (could classify by different criteria e.g. job type) Users can add types and change their definition without DBA intervention Disadvantage Beyond the comprehension of mortals Slow EMP EMP_TYPE EMP_ATTR EMP_TYPE_ATTR ATTRIBUTE

Structure of the Five Tables EMP_TYPE E_code PK E_Type_Desc NN EMP Empno PK E_code FK(EMP_TYPE) Last_name NN First_name NN Still put all common attributes here ATTRIBUTES A_Code PK A_Desc NN EMP_ATTR Empno PK, FK(EMP) A_Code PK, FK(ATTRIBUTE) Value NN E_Type FK(EMP,FK(EMP_TYPE_ATTR) This makes sure that EMP has correct attributes for type EMP_TYPE_ATTR Empno PK, FK(EMP) This declares what attributes A_Code PK, FK(ATTRIBUTE) are valid for each type

Weak Entities What is a weak entity Unique key (H#,C#) Any occurrence (row in the table) must have a parent For example a clinic MUST be in a hospital Unique key (H#,C#) Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 10 1 2 20 30 has (1,1) The key of hospital becomes part of the key for clinic (H#) C# alone cannot identify a row New clinics cannot be added without a value for H# Clinic

Example of Non-Weak Entity Clinic can be identified by C# H# is not part of the primary key H# is merely a foreign key Possible to add a clinic not linked to hospital Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 100 1 101 2 102 103 104 has (0,1) 106 Clinic Unique key

Transitive Dependencies Suppliers have ‘city’ addresses Each city has a status (relevant importance) Supplier id Sup Name City Status 1 Smith London 10 2 Brown York 30 3 Green Paris 50 4 Cox 5 White Status is transitively dependent on Supplier Id Supplier Id City Status

Nutwood Hospital – Entity Modelling Exercise The hospital runs clinic sessions during which patients can book appointments to see their doctors Only one doctor is present at each appointment Each clinic session is held in a single clinic Patients may also undergo operations Only one doctor will perform each operation A series of operations are performed during one theatre session Each theatre session is held entirely within one operating theatre. Doctors may be authorized to work in one or more theatres