1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442.

Slides:



Advertisements
Similar presentations
 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
Advertisements

Normalization Dr. Mario Guimaraes. Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints.
Normalisation The theory of Relational Database Design.
+ Review: Normalization and data anomalies CSCI 2141 W2013 Slide set modified from courses.ischool.berkeley.edu/i257/f06/.../Lecture06_257.ppt.
The Relational Model System Development Life Cycle Normalisation
Modeling the Data: Conceptual and Logical Data Modeling
Monash University Week 7 Data Modelling Relational Database Theory IMS1907 Database Systems.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Normalization I.
Normalization of Database Tables
Chapter 5 Normalization of Database Tables
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form.
Module Title? DBMS E-R Model to Relational Model.
Week 6 Lecture Normalization
Lecture 12 Inst: Haya Sammaneh
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Concepts and Terminology Introduction to Database.
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
Database Normalization.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.
1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442.
FEN Quality checking table design: Design Guidelines Normalisation Table Design Is this OK?
Natural vs. Generated Keys. Definitions Natural key—a key that occurs in the data, that uniquely identifies rows. AKA candidate key. Generated key—a key.
1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442.
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
1 Functional Dependencies and Normalization Chapter 15.
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
WXGE 6101 DATABASE CONCEPTS & IMPLEMENTATIONS. Lesson Overview The Relational Model Terminology of relational model. Properties of database relations.
Design Process - Where are we?
Relational Model & Normalization Relational terminology Anomalies and the need for normalization Normal forms Relation synthesis De-normalization.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
CSCI 6442 Database Management II INTRODUCTION Copyright 2016 David C. Roberts, all rights reserved.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
Brian Thoms.  Databases normalization The systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
RELATIONAL TABLE NORMALIZATION. Key Concepts Guidelines for Primary Keys Deletion anomaly Update anomaly Insertion anomaly Functional dependency Transitive.
Lecture 4: Logical Database Design and the Relational Model 1.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
ITD1312 Database Principles Chapter 4C: Normalization.
Microsoft Access CS 110 Fall Entity Relationship Model Entities Entities Principal data object about which information is to be collectedPrincipal.
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Normalization.
Unit 7 Normalization (表格正規化).
Normalization February 28, 2019 DB:Normalization.
Presentation transcript:

1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

2 Agenda Roberts’s Rules Normalization Roberts’s Rules and Normalization

3 Why Are We Talking About This? To design a database, we choose a set of entities that models a problem We will store data in tables corresponding to our entity choices The names of the entity types, and what’s in which table, becomes embedded in our programs Changing later on is complex, so we want a stable model of the problem

4 Midterm Question The first question on the midterm will deal with normal forms. It will deal with the relationship between normal forms and Roberts’s Rules. This one question will count more than any other question on the exam. The homework assignment for next week looks a lot like Question 1 on the midterm.

5 Syntax and Semantics Syntax deals with the structure and form of a statement or language Semantics deals with the meaning that is conveyed by a statement or language

6 Question Is normalization a syntactic or a semantic construct? That is, does it deal with the form of information, or is it involved with meaning?

7 Intentional vs. Extensional Data Extensional data—the data that is actually present Intentional data—all the data that is allowed to be present Question: does normalization deal with intentional or extensional data?

8 Entity and Entity Type An entity is something that we record information about in the database An entity type is a set of similar things that we store information about An entity instance is one example of some entity type. Usually we don’t say entity instance and entity type when context makes the meaning clear; we just say entity.

Relations We use a relation to model a single entity type The relation is a set of tuples Each tuple is an ordered collection of values of attributes of the entity type Each tuple of the relation corresponds to a single instance of the entity type 9

10 Summary of Terminology Real WorldTheoryDatabase Entity TypeRelationTable Entity InstanceTupleRow AttributeFactColumn

11 Facts A value of an attribute in a row conveys one fact about an entity instance An attribute is a fact stating that “This entity instance has the value ” Consider emp(empno,ename,job,deptno) Each value of ename in a row states that “This person’s name is ”. Each row of this table can be viewed as a collection of four facts

Example of Facts EMPNOENAMEJOBDEPTNO 10WuPresident1 20LiuVP2 30ChenVP2 12

Data Modeling The entire relational database, which is a set of relations, models something in the real world The job of constructing that set of relations is called data modeling. In general, in data modeling we are designing a collection of relations that models a part of the real world All of the formality of normalization is all about how to construct a data model that behaves the way we want it to 13

What We’ll Do Now First, we’ll talk about Roberts’s Rules, a collection of rules in plain English about how to design a database. We’ll be careful to fully understand Roberts’s Rules. Then we’ll talk about the basic normal forms: 1NF, 2NF, 3NF, BCNF and 4NF. We’ll take time to understand the normal forms: what does each actually do? Finally, we’ll look at the correspondence of the normal forms with Roberts’s Rules. You will finish this exploration by additional exploration that you will do in your homework. 14

15 Roberts’s Rules Roberts’s Rules are a set of plain English rules that, if followed during database design, result in a highly normalized database design. We will explore the relationship of Roberts’s Rules to normalization, and vice versa.

16 Roberts’s Rules

17 Rule 1 Each relation describes exactly one entity type. A relation models a distinct entity type, and each tuple of the relation models an instance of that entity. The relation models an entity by storing its attributes. The attributes that identify it are called candidate keys; the other attributes are non-key.

Do these follow Rule 1? DESK(SER#, HEIGHT, WIDTH, COST, CUSTODIANSALARY) EMP-CAR (EMP#, ENAME, DEPTNO, CARVIN#, CARMAKE, CARYEAR) EMP(EMP#, ENAME, JOB,DEPTNO, DEPTCITY) 18

19 Rule 2 Each fact is represented only once in the database. A tuple (aka row) is a collection of facts about an entity instance, one fact per column. Each fact can appear only once, in one row of one table.

20 Duplicate Representation? EMPNOENAMEJOBSALDEPTNO 34LiuPres ChenVP CoxSales759 DEPTNODNAMELOC 5HQNYC 9SalesDC 20ResearchSF

21 Rule 3 Each tuple can reside in only one relation. A relation is a model of an entity type, not a station on a factory assembly line. Instead of moving a tuple from relation to relation, add an attribute that characterizes status.

Rule 3 Example As a person is being interviewed and hired, they change status: 1. Resume received 2. Resume being evaluated 3. Selected for interview 4. Selected for hire 5. Hired As status changes, we could more the person’s row from one table to another. Should we? 22

23 Rule 4 If the cardinality of an attribute is greater than one, then database design must be insensitive to cardinality. It’s easy—and very risky—to presume that the cardinality of various entity types and subtypes will remain the same.

24 Rule 4 Examples Company car College degree Telephone number Home address Business address address

25 Example of Roberts’s Rules EMP ( EMPNO, ENAME, DEPTNO, DNAME) DEPT (DEPTNO, DNAME, DLOC) This relation violates the following Roberts’s rules : Rule 1. The EMP table describes employee as well as department Rule 2. In the EMP table, if we have the same DEPTNO in multiple rows, DNAME will be represented multiple times.

26 Another Example EMP (ENAME, DEGREE1, DEGREE2, DEGREE3) This schema violates the following Roberts’s rule : Rule 4. The design assumes every employee has a maximum of 3 degrees. If an employee has 4 degrees, then the database needs to be restructured by adding DEGREE4 in the EMP table. Rule 4 deals with an aspect of data independence. It can be stated informally as: "Grow down, not across"

A Question Are Rule 1 and Rule 2 equivalent? They are equivalent if the set of relations that satisfy Rule 1 is the same as the set of relations that satisfies Rule 2. This is a homework problem. 27

28 Normalization Preliminaries

29 Normalization A set of formal rules that are intended to be a definition of a properly-structured database A normal form generally deals with and removes certain anomalous behavior from the use of a relation that is normalized.

30 Examples of Anomalies Insert anomalies If we want to enter information about a new entity in the database we need to enter information about some other entity first Delete anomalies In order to delete information about an entity we must delete information about another entity Update anomalies In order to change the value of a single fact we may have to change many stored values in the database

31 Basic Concepts Entity Type: a class of an object that we record information about. Aka relation, table Attribute: a characteristic of an entity. Aka column. Entity Instance: a single occurrence of an entity type. Aka tuple, row

32 Candidate Keys Candidate key: a set of attributes A i, A j,…A k that is a candidate key has two (time- invariant) properties: 1. Uniqueness – no two tuples have the same value for the candidate key Minimality – if any A i is discarded from the candidate key, then the uniqueness property is lost. It is the smallest set of attributes that identifies a row. How many candidate keys can a table have?

33 Primary Key One of the candidate keys is selected to be the primary identifier of rows. It is called the primary key. The selection is usually made based on the usefulness of the attribute that is the primary key.

34 Functional Dependence R.X→R.Y or R.X FD R.Y Given a relation R, attribute Y of R is functionally dependent on attribute X of R iff each X-value in R has associated with it precisely one Y-value in R (at any one time) In other words, for each value of X in table R, there is one and only one value of Y. A given X value must always occur with the same Y value.

35 Functional Dependence Examples XY 1A 2C 3B 1A 2C 4A 3B 6B Does X→Y? Does Y→X?

36 Anomalies Update anomalies: If one copy of repeated data is updated, inconsistency is created unless all copies are similarly updated. Insert anomalies: It may not be possible to store some information unless some other information is stored as well. Delete anomalies: It may not be possible to delete some information without losing some other information as well.

37 Full Functional Dependence Y is fully functionally dependent on X iff X→Y and no subset of X determines Y. That is, X is the smallest collection of columns that determines Y.

“Aboutness” FD is about “aboutness” If A is FD on X, then A is “about” X Suppose X is employee ID, EID; then EID determines salary, SAL But SAL is “about” the employee identified by EID 38

39 Normalization

40 First Normal Form

41 First Normal Form A relation is said to be in first normal form iff every attribute of every tuple is atomic.

42 1NF Example EmpnoEnameJobEducDeptno 33JonesPresBS EE, MS EE, PhD Comp Sci 3 324ChuVPBS EE, MBA3 88KumarSalesBS EE, MA Comm4 65YuQuality Contr.BS CS, MS CS, PhD CS5 Question: Is this relation in 1NF? Question: Does this relation show any anomalies?

What’s not allowed by 1NF? 1NF doesn’t allow a relation to contain Lists Other relations Multiple values 43

44 Second Normal Form

45 Second Normal Form A relation is said to be in second normal form iff it is in first normal form and every attribute is fully functionally dependent on the primary key.

46 2NF Example SIDSNAMECityStatus 4SmithNYC45 6LiuDC65 7ChenNYC45 9JonesLA22 SID SNAME CityStatus Does this relation follow Roberts’s Rules? Do you see any anomalies?

47 2NF and RR What is the relationship between 2NF and Roberts’s Rules? If Rule 1 is met, is the relation in 2NF? What about Rule 2?

What does 2NF not permit? 2NF doesn’t allow a relation to have information about more than one entity type 48

49 Third Normal Form

50 Third Normal Form A relation is said to be in third normal form iff it is in second normal form and there are no transitive dependencies.

51 How Do We Convert To 3NF? SIDSNAMECityStatus 4SmithNYC45 6LiuDC65 7ChenNYC45 9JonesLA22 SIDSNAMECity 4SmithNYC 6LiuDC 7ChenNYC 9JonesLA CityStatus DC65 NYC45 LA22 SID SNAME City StatusCity

52 3NF and RR If a relation is in 3NF, what about rules 1 and 2?

What is not permitted by 3NF? 3NF refines the notion of “aboutness” beyond the restrictions of 2NF 53

54 Fourth Normal Form

55 Multi-Valued Dependency R.X is said to multi-value determine R.Y if there is a set of values for Y that must appear in any relation where R.X appears. For example, if a course has two textbooks, then there will be an MVD between the course number and the names of the books.

56 Fourth Normal Form A relation is said to be in fourth normal form iff it is in third normal form and it does not have more than one multi- valued dependency.

57 Example of 4NF Is this relation in 4NF? SIDSportInstrument 87SoccerSaxophone 87TennisViolin 87SoccerViolin 87TennisSaxophone SID Sport Instrument MVD SPORT-INSTRUMENT

58 Converting to 4NF SIDSport 87Soccer 87Tennis SIDInstrument 87Saxophone 87Violin SIDSport Instrument MVD SID SPORTINSTRUMENT

What does 4NF not permit? 4NF does not permit multiple MVDs in a single relation 59

60 Boyce-Codd Normal Form

61 Boyce-Codd Normal Form A relation is said to be in Boyce-Codd normal form iff every determinant is a key. BCNF deals with problems that can be caused by overlapping candidate keys.

62 Example of BCNF S#SNAMEP#QTY 1Acme Chen3476 3Jones6534 How does this relation comply with Rule 1 and Rule 2? S# SNAME P# QTY Are there any anomalies?

63 Converting to BCNF S#P#QTY S#SNAME 1Acme 2Chen 3Jones S#SNAME P# QTY S#

What does BCNF not allow? BCNF brings the restrictions on “aboutness” to candidate and composite keys 64

Roberts’s Rules and Normal Forms 65

Rule 1: One Entity Type Per Table Each row must be about a single entity type Can’t have information about two entity types Think of FD. RR1 requires FD, does not allow transitive FD. 66

Rule 2: Each Fact Represented Once What must happen for a single fact to be represented more than once? Most likely, there is a transitive dependency So RR2 seems to disallow transitive dependency 67

What About 4NF? Lack of 4NF causes duplicate representation of facts. Not permitted by RR2. 68

You will have the opportunity for more exploration of this relationship with your homework for next week. 69

70 Data Modeling When we design a relational database, we search for a set of entity types that will model the problem of interest If we choose a robust data model, it will last a long time without major changes, even though the programs that use it may change Now that we have some idea what a good data model is, we will talk about how to design one.