Entity Modelling Tips
Quick Intro Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director ANSI/ISO Standards Committee - SQL Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Director Member of IOUC Day job – Tradba Ltd
What is an Entity Oxford English Dictionary Entity : Noun.1 Thing Bloom’s Taxonomy Thing : Whatever is or maybe an object of thought James Martin’s definitions of IT terms Object : A real-world entity
Relationships and the Time Dimension Building Society Scenario Manager Branch Each and every manager manages one and only one branch Each and every branch is managed by one and only one manager Could (should?) be implemented as one entity (table) But what about the time dimension? How could the relationship ‘marriage’ be modelled?
Why Normalise? Normalising usually implies splitting data apart into separate tables This means data often has to be joined back together Why not keep the data in one table? Attempt 1 Store department details in line with employee information in emp_dept emp_dept EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ----- ------ --------- ---- --------- ---- ---- ------ ---------- -------- 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 ACCOUNTING NEW YORK 7839 KING PRESIDENT 17-NOV-81 5000 10 ACCOUNTING NEW YORK 7934 MILLER CLERK 7782 23-JAN-82 1300 10 ACCOUNTING NEW YORK 7566 JONES MANAGER 7839 02-APR-81 2975 20 RESEARCH DALLAS 7902 FORD ANALYST 7566 03-DEC-81 3000 20 RESEARCH DALLAS 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 RESEARCH DALLAS 7369 SMITH CLERK 7902 17-DEC-80 800 20 RESEARCH DALLAS 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 RESEARCH DALLAS 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 SALES CHICAGO 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 SALES CHICAGO 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 SALES CHICAGO 7900 JAMES CLERK 7698 03-DEC-81 950 30 SALES CHICAGO 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 SALES CHICAGO 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 SALES CHICAGO
The emp_dept table Primary key is empno 1. No way to add a new department without an employee 2. Changing a department name or location would require multiple rows to be updated Risk of inconsistencies 3. Deleting the last employee in a department would remove the department information
Attempt 2 – The dept_emp table Store employee details in line with departmental information Requires an indeterminate number of columns Eight columns for each employee in the department Key is deptno Cannot add a new employee without a department Could easily have duplicate employees Would need to know that TURNER is employee number 2 in department 30 Suppose one department was much bigger than the rest Leads to lots of columns with most rows having many null values DEPTNO DNAME LOC EMPNO1 ENAME1 JOB1 MGR1 HIREDATE1 SAL1 COMM1 EMPNO2 ... ------ ---------- -------- ------ --------- --------- ---- --------- ----- ----- ------ 10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 7839 09-JUN-81 2450 7839 ... 20 RESEARCH DALLAS 7566 JONES MANAGER 7939 02-APR-81 2975 7902 ... 30 SALES CHICAGO 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 7844 ...
Many to Many Relationships EMP PROJECT Cannot be directly implemented as two tables Number of foreign key columns is indeterminate Similar to problems shown on previous slide Link employees to projects by inserting foreign empno columns into project table If a project has 3000 employees it would require 3000 empno columns One project may have many more employees than others Results in many null values for most rows Link projects to employees by inserting foreign proj_id columns into emp table If an employee has 50 projects it would require 50 empno columns One employee may be on many more projects than others
Many to Many Relationships (continued) Many columns – possibly many NULLs PROJ_ID P_DESC EMPNO1 EMPNO2 EMPNO3 EMPNO4 EMPNO5 EMPNO6 EMPNO7 EMPNO8 EMPNO8 ... ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 1 ARTS 7566 7900 7942 7369 7654 7902 7876 7482 7321 ... 2 DESIGN 7600 7942 3 IT 7988 7600 7788 : : : : : Cannot add an employee to a project if all columns are full Difficult to handle change of EMPNO EMPNO ENAME PROJ_ID1 PROJ_ID2 PROJ_ID3 PROJ_ID4 PROJ_ID5 PROJ_ID6 PROJ_ID7 ... ----- ------ -------- -------- -------- -------- -------- -------- -------- ------- 7399 COX 3 5 9 8 2 12 14 ... 7942 MILLS 1 7001 CARVER 4 3 : : : : Cannot add a project to an employee if all columns are full Difficult to handle change of PROJ_ID
Many to Many Relationships (continued) EMP PROJ_EMP PROJECT EMPNO ENAME ----- ------ 7399 COX 7942 MILLS 7001 CARVER : : PROJ_ID EMPNO ------- ------ 1 7942 1 7566 1 7900 1 7369 1 7654 1 7902 1 7876 1 7482 1 7321 2 7600 2 7942 3 7988 3 7600 3 7788 : : PROJ_ID P_DESC ------- ------ 1 ARTS 2 DESIGN 3 IT : : The link entity , PROJ_EMP, has a primary key of PROJ_ID,EMPNO and foreign keys PROJ_ID (referencing PROJECT) and EMPNO (referencing EMP) Easy to make changes to data
Fan Traps Arise from M:1 and 1:M relationships through one entity Subsidiary Company Department Employee Diagram give rise to three tables : Department Company Employee D# C# D1 C1 D2 C2 D3 C# C1 C2 E# C# E1 C1 E2 E3 E4 C2 E5 E6 E7 E8
Data Patterns in Fan Traps Green is employee number E7 In which department does Green work? The possibilities fan out into D1, D2 and D3 from C2 E1 E2 E3 E4 E5 E6 E7 E8 D1 D2 D3 C1 C2
Significant and non-Significant Fan Traps Fixing the model is easy The hierarchy is changed Subsidiary Company Department Employee
Non-Significant Fan Traps A very similar construct, but gives no problem Warehouse Stock Sales Office There is no specific relationship between Stock and Sales Office Business rules are that a Sales Office can receive ANY of the Stock from its Warehouse The fan trap is non-significant
Resolving Relationships Key requirement Which jockey was riding which horse in which race? Horse Jockey Race
Resolving Relationships (continued) Race Horse R1 H1 R1 H2 R3 H1 R2 H3 R4 H3 Jockey Horse J1 H1 J1 H2 J3 H1 J3 H3 Horse Horse/Jockey Horse H1 H2 H3 Race/Horse Race Jockey Race Jockey R1 J1 R1 J3 R3 J1 R2 J3 R4 J3 Jockey J1 J2 J3 Race R1 R2 R3 R4 Race/Jockey
Chasm Traps Hospital Example – existing observed data pattern Hospital Clinic Doctor Leeds Eye Nose Throat Black Orange Yellow Brown Green Blue Red Pink Gives rise to the following relationships Hospital Clinic Doctor
Chasm Traps (continued) 1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 Doctors can be tracked through to their hospital
Chasm Traps (continued) But another doctor called White (doctor number 500) is discovered He works at Leeds but is not attached to a specific clinic H# 1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 500 ?? Gives rise to a chasm trap – there is no route through from doctor to hospital via clinic The situation is now better represented as : This shows a potential chasm trap Hospital Clinic Doctor
Chasm Traps (continued) Model fixed by adding a relationship between doctor and hospital Hospital Clinic Doctor Results in an extra foreign key column in the doctor table H# 1 Leeds 2 Halifax C# H# 10 1 20 2 30 40 D# C# H# 100 20 2 200 10 1 300 400 500 ??
Recursive Relationships An employee may manage one or more other employees An employee may be managed by one and only one manager Emp EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ------ --------- ---- --------- ---- ---- ------ 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7839 KING PRESIDENT 17-NOV-81 5000 10 7934 MILLER CLERK 7782 23-JAN-82 1300 10 7566 JONES MANAGER 7839 02-APR-81 2975 20 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 7369 SMITH CLERK 7902 17-DEC-80 800 20 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7900 JAMES CLERK 7698 03-DEC-81 950 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
Arcs B A C D Each instance of A is related to one instance of either B or C or D Cannot take part in more than one of the relationships Allows different physical designs Known as EXCLUSIVE-OR relationship
Arcs – Alternative Table Designs Bid Cid Did x Rel id B X D C x or Table A1 has a column for each foreign key Leads to lots of Null values Need complex integrity checking to avoid more than one foreign key column having a non-null value in each row New foreign key columns needed if extra relationships are added to the arc Table A2 has only one column of foreign key values plus a column to identify which parent table is related to the row No null values – easier integrity checking More complex joins (need to equate two columns (rel and id)) Greater flexibility if new relationships added No need to add columns to A2 Keys in B,C and D must be of the same datatype
Arcs – More Alternative Table Designs A1 could be separated out into three tables One for each relationship Advantages No need to scan the full A table when looking for rows in A of a particular type Disadvantages No easy way of viewing all rows of type A Need to union A1, A2 and A3 Effectively this is a form of subtyping (of A) A1 B A2 C A3 D
Arcs – Even More Alternative Table Designs B,C and D could be regarded as similar entities if their data is similar Combine them into one table (Z) Effectively collapsing subtypes into their supertype Advantages No need for separate foreign key columns in the A table Simple joins Disadvantages Harder to distinguish between rows in A of type B, C and D Bigger data sets to process B, C and D must have keys of same datatype Z B A C D
Cyclic Relationships Key : Sta Key : Sta,D State District Street City Key : Sta,D,C,Str Key : Sta,D,C Leads to redundant data
Cyclic Relationships (continued) Primary key of street table Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Additional foreign key from direct relationship with state Inconsistencies can arise due to redundant relationship Foreign key via state, district, and city
Cyclic Relationships (continued) Inconsistency easy to spot The two Sta# columns in Street are not equal But what if surrogate keys are used? Sta# D# Sta1 D1 D2 D3 Sta2 Sta3 District Sta# Sta1 Sta2 Sta3 Sta4 State Street City Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Sta# D# C# Sta1 D3 C2 D2 C1 Sta2 C4 C3 Sta3 D1
Surrogate Keys Not easy to see that key value Sur_C5 has an inconsistency with Sta4 in the Sta# column The surrogate key Sur_C5 represents C1, D1, Sta3 Street City Sur_C# Str# Sta# Sur_C1 Str2 Sta2 Sur_C2 Str4 Sta4 Sur_C3 Sur_C5 Str3 Sta3 Sur_C# Sta# D# C# Sur_C1 Sta2 D1 C2 Sur_C2 Sta4 D2 C1 Sur_C3 D3 C4 Sur_C4 Sta3 C3 Sur_C5 Surrogate key – unique for all cities
Sub-Typing Methods Consider the following diagram Has a supertype consisting of two subtypes Hourly-paid employees and salaried employees EMP SAL HOURLY How could this be physically implemented?
Alternative 1 One table Either fill in salary field or hourly field – but not both Lots of NULLs (either Sal_code or Hourly_rate must be NULL) EMP Empno Last_name First_name Sal_code Hourly_rate 1 SMITH TONY AX 2 WILSON DOUG 12.00 3 WARD AMY 15.00 4 ATKINS PAUL B2 5 MILLS GEORGE B5 Advantage Easy design Disadvantage Wasted space Need to write triggers to preserve integrity
Alternative 2 Two tables One for hourly paid employees, one for salaried employees Empno Last_name First_name Sal_code 1 SMITH TONY AX 4 ATKINS PAUL B2 5 MILLS GEORGE B5 EMP_SAL Empno Last_name First_name Hourly_rate 2 WILSON DOUG 12.00 3 WARD AMY 15.00 EMP_HOUR Advantage Easy design (no NULLs) Disadvantage No easy way to work with all employees Best used when applications do not use both tables
Alternative 3 Three tables One for common attributes (supertype) and two subtypes Empno Last_name First_name 1 SMITH TONY 2 WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE PK(empno) EMP Empno Hourly_rate 2 12.00 3 15.00 Empno Sal_code 1 AX 4 B2 5 B5 EMP_HOUR EMP_SAL Advantage Collection of common attributes Disadvantage No easy way to know if named employee is salaried or hourly paid
Alternative 4 Four tables Same as alternative 3 but with EMP_TYPE to allow more generic constructs Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE EMP_TYPE E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Advantage Easy to write applications against Disadvantage E_code is redundant data in EMP
Alternative 4 Additional tables needed if extra types introduced EMP Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE 6 C LORD PETER E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Empno Commission 6 4450.00 EMP_COMM
Alternative 5 Five Tables – Generic solution Advantage Easy to modify structure (could classify by different criteria e.g. job type) Users can add types and change their definition without DBA intervention Disadvantage Beyond the comprehension of mortals Slow EMP EMP_TYPE EMP_ATTR EMP_TYPE_ATTR ATTRIBUTE
Structure of the Five Tables EMP_TYPE E_code PK E_Type_Desc NN EMP Empno PK E_code FK(EMP_TYPE) Last_name NN First_name NN Still put all common attributes here ATTRIBUTES A_Code PK A_Desc NN EMP_ATTR Empno PK, FK(EMP) A_Code PK, FK(ATTRIBUTE) Value NN E_Type FK(EMP,FK(EMP_TYPE_ATTR) This makes sure that EMP has correct attributes for type EMP_TYPE_ATTR Empno PK, FK(EMP) This declares what attributes A_Code PK, FK(ATTRIBUTE) are valid for each type
Weak Entities What is a weak entity Unique key (H#,C#) Any occurrence (row in the table) must have a parent For example a clinic MUST be in a hospital Unique key (H#,C#) Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 10 1 2 20 30 has (1,1) The key of hospital becomes part of the key for clinic (H#) C# alone cannot identify a row New clinics cannot be added without a value for H# Clinic
Example of Non-Weak Entity Clinic can be identified by C# H# is not part of the primary key H# is merely a foreign key Possible to add a clinic not linked to hospital Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 100 1 101 2 102 103 104 has (0,1) 106 Clinic Unique key
Transitive Dependencies Suppliers have ‘city’ addresses Each city has a status (relevant importance) Supplier id Sup Name City Status 1 Smith London 10 2 Brown York 30 3 Green Paris 50 4 Cox 5 White Status is transitively dependent on Supplier Id Supplier Id City Status
Nutwood Hospital – Entity Modelling Exercise The hospital runs clinic sessions during which patients can book appointments to see their doctors Only one doctor is present at each appointment Each clinic session is held in a single clinic Patients may also undergo operations Only one doctor will perform each operation A series of operations are performed during one theatre session Each theatre session is held entirely within one operating theatre. Doctors may be authorized to work in one or more theatres