Presentation is loading. Please wait.

Presentation is loading. Please wait.

Entity Modelling Tips.

Similar presentations


Presentation on theme: "Entity Modelling Tips."— Presentation transcript:

1 Entity Modelling Tips

2 Quick Intro Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director ANSI/ISO Standards Committee - SQL Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Director Member of IOUC Day job – Tradba Ltd

3 What is an Entity Oxford English Dictionary Entity : Noun.1 Thing Bloom’s Taxonomy Thing : Whatever is or maybe an object of thought James Martin’s definitions of IT terms Object : A real-world entity

4 Relationships and the Time Dimension
Building Society Scenario Manager Branch Each and every manager manages one and only one branch Each and every branch is managed by one and only one manager Could (should?) be implemented as one entity (table) But what about the time dimension? How could the relationship ‘marriage’ be modelled?

5 Why Normalise? Normalising usually implies splitting data apart into separate tables This means data often has to be joined back together Why not keep the data in one table? Attempt 1 Store department details in line with employee information in emp_dept emp_dept EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC 7782 CLARK MANAGER JUN ACCOUNTING NEW YORK 7839 KING PRESIDENT NOV ACCOUNTING NEW YORK 7934 MILLER CLERK JAN ACCOUNTING NEW YORK 7566 JONES MANAGER APR RESEARCH DALLAS 7902 FORD ANALYST DEC RESEARCH DALLAS 7876 ADAMS CLERK JAN RESEARCH DALLAS 7369 SMITH CLERK DEC RESEARCH DALLAS 7788 SCOTT ANALYST DEC RESEARCH DALLAS 7521 WARD SALESMAN FEB SALES CHICAGO 7844 TURNER SALESMAN SEP SALES CHICAGO 7499 ALLEN SALESMAN FEB SALES CHICAGO 7900 JAMES CLERK DEC SALES CHICAGO 7698 BLAKE MANAGER MAY SALES CHICAGO 7654 MARTIN SALESMAN SEP SALES CHICAGO

6 The emp_dept table Primary key is empno
1. No way to add a new department without an employee 2. Changing a department name or location would require multiple rows to be updated Risk of inconsistencies 3. Deleting the last employee in a department would remove the department information

7 Attempt 2 – The dept_emp table
Store employee details in line with departmental information Requires an indeterminate number of columns Eight columns for each employee in the department Key is deptno Cannot add a new employee without a department Could easily have duplicate employees Would need to know that TURNER is employee number 2 in department 30 Suppose one department was much bigger than the rest Leads to lots of columns with most rows having many null values DEPTNO DNAME LOC EMPNO1 ENAME1 JOB MGR1 HIREDATE1 SAL1 COMM1 EMPNO2 ... 10 ACCOUNTING NEW YORK CLARK MANAGER JUN 20 RESEARCH DALLAS JONES MANAGER APR 30 SALES CHICAGO WARD SALESMAN FEB

8 Many to Many Relationships
EMP PROJECT Cannot be directly implemented as two tables Number of foreign key columns is indeterminate Similar to problems shown on previous slide Link employees to projects by inserting foreign empno columns into project table If a project has 3000 employees it would require 3000 empno columns One project may have many more employees than others Results in many null values for most rows Link projects to employees by inserting foreign proj_id columns into emp table If an employee has 50 projects it would require 50 empno columns One employee may be on many more projects than others

9 Many to Many Relationships (continued)
Many columns – possibly many NULLs PROJ_ID P_DESC EMPNO1 EMPNO2 EMPNO3 EMPNO4 EMPNO5 EMPNO6 EMPNO7 EMPNO8 EMPNO8 ... 1 ARTS 2 DESIGN 3 IT : : : : : Cannot add an employee to a project if all columns are full Difficult to handle change of EMPNO EMPNO ENAME PROJ_ID1 PROJ_ID2 PROJ_ID3 PROJ_ID4 PROJ_ID5 PROJ_ID6 PROJ_ID7 ... 7399 COX 7942 MILLS 7001 CARVER : : : : Cannot add a project to an employee if all columns are full Difficult to handle change of PROJ_ID

10 Many to Many Relationships (continued)
EMP PROJ_EMP PROJECT EMPNO ENAME 7399 COX 7942 MILLS 7001 CARVER : : PROJ_ID EMPNO 1 7942 1 7566 1 7900 1 7369 1 7654 1 7902 1 7876 1 7482 1 7321 2 7600 2 7942 3 7988 3 7600 3 7788 : : PROJ_ID P_DESC 1 ARTS 2 DESIGN 3 IT : : The link entity , PROJ_EMP, has a primary key of PROJ_ID,EMPNO and foreign keys PROJ_ID (referencing PROJECT) and EMPNO (referencing EMP) Easy to make changes to data

11 Fan Traps Arise from M:1 and 1:M relationships through one entity
Subsidiary Company Department Employee Diagram give rise to three tables : Department Company Employee D# C# D1 C1 D2 C2 D3 C# C1 C2 E# C# E1 C1 E2 E3 E4 C2 E5 E6 E7 E8

12 Data Patterns in Fan Traps
Green is employee number E7 In which department does Green work? The possibilities fan out into D1, D2 and D3 from C2 E1 E2 E3 E4 E5 E6 E7 E8 D1 D2 D3 C1 C2

13 Significant and non-Significant Fan Traps
Fixing the model is easy The hierarchy is changed Subsidiary Company Department Employee

14 Non-Significant Fan Traps
A very similar construct, but gives no problem Warehouse Stock Sales Office There is no specific relationship between Stock and Sales Office Business rules are that a Sales Office can receive ANY of the Stock from its Warehouse The fan trap is non-significant

15 Resolving Relationships
Key requirement Which jockey was riding which horse in which race? Horse Jockey Race

16 Resolving Relationships (continued)
Race Horse R1 H1 R1 H2 R3 H1 R2 H3 R4 H3 Jockey Horse J1 H1 J1 H2 J3 H1 J3 H3 Horse Horse/Jockey Horse H1 H2 H3 Race/Horse Race Jockey Race Jockey R1 J1 R1 J3 R3 J1 R2 J3 R4 J3 Jockey J1 J2 J3 Race R1 R2 R3 R4 Race/Jockey

17 Chasm Traps Hospital Example – existing observed data pattern Hospital
Clinic Doctor Leeds Eye Nose Throat Black Orange Yellow Brown Green Blue Red Pink Gives rise to the following relationships Hospital Clinic Doctor

18 Chasm Traps (continued)
1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 Doctors can be tracked through to their hospital

19 Chasm Traps (continued)
But another doctor called White (doctor number 500) is discovered He works at Leeds but is not attached to a specific clinic H# 1 Leeds 2 Halifax C# H# 10 1 20 30 40 2 D# C# 100 20 200 10 300 400 500 ?? Gives rise to a chasm trap – there is no route through from doctor to hospital via clinic The situation is now better represented as : This shows a potential chasm trap Hospital Clinic Doctor

20 Chasm Traps (continued)
Model fixed by adding a relationship between doctor and hospital Hospital Clinic Doctor Results in an extra foreign key column in the doctor table H# 1 Leeds 2 Halifax C# H# 10 1 20 2 30 40 D# C# H# 100 20 2 200 10 1 300 400 500 ??

21 Recursive Relationships
An employee may manage one or more other employees An employee may be managed by one and only one manager Emp EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 7782 CLARK MANAGER JUN 7839 KING PRESIDENT NOV 7934 MILLER CLERK JAN 7566 JONES MANAGER APR 7902 FORD ANALYST DEC 7876 ADAMS CLERK JAN 7369 SMITH CLERK DEC 7788 SCOTT ANALYST DEC 7521 WARD SALESMAN FEB 7844 TURNER SALESMAN SEP 7499 ALLEN SALESMAN FEB 7900 JAMES CLERK DEC 7698 BLAKE MANAGER MAY 7654 MARTIN SALESMAN SEP

22 Arcs B A C D Each instance of A is related to one instance of either B or C or D Cannot take part in more than one of the relationships Allows different physical designs Known as EXCLUSIVE-OR relationship

23 Arcs – Alternative Table Designs
Bid Cid Did x Rel id B X D C x or Table A1 has a column for each foreign key Leads to lots of Null values Need complex integrity checking to avoid more than one foreign key column having a non-null value in each row New foreign key columns needed if extra relationships are added to the arc Table A2 has only one column of foreign key values plus a column to identify which parent table is related to the row No null values – easier integrity checking More complex joins (need to equate two columns (rel and id)) Greater flexibility if new relationships added No need to add columns to A2 Keys in B,C and D must be of the same datatype

24 Arcs – More Alternative Table Designs
A1 could be separated out into three tables One for each relationship Advantages No need to scan the full A table when looking for rows in A of a particular type Disadvantages No easy way of viewing all rows of type A Need to union A1, A2 and A3 Effectively this is a form of subtyping (of A) A1 B A2 C A3 D

25 Arcs – Even More Alternative Table Designs
B,C and D could be regarded as similar entities if their data is similar Combine them into one table (Z) Effectively collapsing subtypes into their supertype Advantages No need for separate foreign key columns in the A table Simple joins Disadvantages Harder to distinguish between rows in A of type B, C and D Bigger data sets to process B, C and D must have keys of same datatype Z B A C D

26 Cyclic Relationships Key : Sta Key : Sta,D State District Street City
Key : Sta,D,C,Str Key : Sta,D,C Leads to redundant data

27 Cyclic Relationships (continued)
Primary key of street table Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Additional foreign key from direct relationship with state Inconsistencies can arise due to redundant relationship Foreign key via state, district, and city

28 Cyclic Relationships (continued)
Inconsistency easy to spot The two Sta# columns in Street are not equal But what if surrogate keys are used? Sta# D# Sta1 D1 D2 D3 Sta2 Sta3 District Sta# Sta1 Sta2 Sta3 Sta4 State Street City Sta# D# C# Str# Sta1 D3 C2 Str2 D2 C1 Str4 Sta2 C4 Sta3 D1 Str1 Sta4 Str3 Sta# D# C# Sta1 D3 C2 D2 C1 Sta2 C4 C3 Sta3 D1

29 Surrogate Keys Not easy to see that key value Sur_C5 has an inconsistency with Sta4 in the Sta# column The surrogate key Sur_C5 represents C1, D1, Sta3 Street City Sur_C# Str# Sta# Sur_C1 Str2 Sta2 Sur_C2 Str4 Sta4 Sur_C3 Sur_C5 Str3 Sta3 Sur_C# Sta# D# C# Sur_C1 Sta2 D1 C2 Sur_C2 Sta4 D2 C1 Sur_C3 D3 C4 Sur_C4 Sta3 C3 Sur_C5 Surrogate key – unique for all cities

30 Sub-Typing Methods Consider the following diagram
Has a supertype consisting of two subtypes Hourly-paid employees and salaried employees EMP SAL HOURLY How could this be physically implemented?

31 Alternative 1 One table Either fill in salary field or hourly field – but not both Lots of NULLs (either Sal_code or Hourly_rate must be NULL) EMP Empno Last_name First_name Sal_code Hourly_rate 1 SMITH TONY AX 2 WILSON DOUG 12.00 3 WARD AMY 15.00 4 ATKINS PAUL B2 5 MILLS GEORGE B5 Advantage Easy design Disadvantage Wasted space Need to write triggers to preserve integrity

32 Alternative 2 Two tables
One for hourly paid employees, one for salaried employees Empno Last_name First_name Sal_code 1 SMITH TONY AX 4 ATKINS PAUL B2 5 MILLS GEORGE B5 EMP_SAL Empno Last_name First_name Hourly_rate 2 WILSON DOUG 12.00 3 WARD AMY 15.00 EMP_HOUR Advantage Easy design (no NULLs) Disadvantage No easy way to work with all employees Best used when applications do not use both tables

33 Alternative 3 Three tables
One for common attributes (supertype) and two subtypes Empno Last_name First_name 1 SMITH TONY 2 WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE PK(empno) EMP Empno Hourly_rate 2 12.00 3 15.00 Empno Sal_code 1 AX 4 B2 5 B5 EMP_HOUR EMP_SAL Advantage Collection of common attributes Disadvantage No easy way to know if named employee is salaried or hourly paid

34 Alternative 4 Four tables
Same as alternative 3 but with EMP_TYPE to allow more generic constructs Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE EMP_TYPE E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Advantage Easy to write applications against Disadvantage E_code is redundant data in EMP

35 Alternative 4 Additional tables needed if extra types introduced EMP
Empno E_code Last_name First_name 1 S SMITH TONY 2 H WILSON DOUG 3 WARD AMY 4 ATKINS PAUL 5 MILLS GEORGE 6 C LORD PETER E_code Sal_code H Hourly paid S Salaried C Commission only EMP EMP_HOUR Empno Hourly_rate 2 12.00 3 15.00 EMP_SAL Empno Sal_code 1 AX 4 B2 5 B5 Empno Commission 6 EMP_COMM

36 Alternative 5 Five Tables – Generic solution Advantage
Easy to modify structure (could classify by different criteria e.g. job type) Users can add types and change their definition without DBA intervention Disadvantage Beyond the comprehension of mortals Slow EMP EMP_TYPE EMP_ATTR EMP_TYPE_ATTR ATTRIBUTE

37 Structure of the Five Tables
EMP_TYPE E_code PK E_Type_Desc NN EMP Empno PK E_code FK(EMP_TYPE) Last_name NN First_name NN Still put all common attributes here ATTRIBUTES A_Code PK A_Desc NN EMP_ATTR Empno PK, FK(EMP) A_Code PK, FK(ATTRIBUTE) Value NN E_Type FK(EMP,FK(EMP_TYPE_ATTR) This makes sure that EMP has correct attributes for type EMP_TYPE_ATTR Empno PK, FK(EMP) This declares what attributes A_Code PK, FK(ATTRIBUTE) are valid for each type

38 Weak Entities What is a weak entity Unique key (H#,C#)
Any occurrence (row in the table) must have a parent For example a clinic MUST be in a hospital Unique key (H#,C#) Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 10 1 2 20 30 has (1,1) The key of hospital becomes part of the key for clinic (H#) C# alone cannot identify a row New clinics cannot be added without a value for H# Clinic

39 Example of Non-Weak Entity
Clinic can be identified by C# H# is not part of the primary key H# is merely a foreign key Possible to add a clinic not linked to hospital Hospital (0,n) H# 1 Leeds 2 Halifax 3 York C# H# 100 1 101 2 102 103 104 has (0,1) 106 Clinic Unique key

40 Transitive Dependencies
Suppliers have ‘city’ addresses Each city has a status (relevant importance) Supplier id Sup Name City Status 1 Smith London 10 2 Brown York 30 3 Green Paris 50 4 Cox 5 White Status is transitively dependent on Supplier Id Supplier Id City Status

41 Nutwood Hospital – Entity Modelling Exercise
The hospital runs clinic sessions during which patients can book appointments to see their doctors Only one doctor is present at each appointment Each clinic session is held in a single clinic Patients may also undergo operations Only one doctor will perform each operation A series of operations are performed during one theatre session Each theatre session is held entirely within one operating theatre. Doctors may be authorized to work in one or more theatres


Download ppt "Entity Modelling Tips."

Similar presentations


Ads by Google