Functional Dependencies

Slides:



Advertisements
Similar presentations
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
Advertisements

1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Functional Dependencies and Relational Schema Design.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Normalization.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Functional Dependencies Zaki Malik September 25, 2008.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
10/3/2017.
Lecture 11: Functional Dependencies
COP 6726: New Directions in Database Systems
Functional Dependency and Normalization
Design Theory for Relational Databases
Schedule Today: Next After that Normal Forms. Section 3.6.
Functional Dependencies and Normalization for Relational Databases
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Relational Database Design by Dr. S. Sridhar, Ph. D
Schedule Today: Jan. 23 (wed) Week of Jan 28
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Chapter 8: Relational Database Design
Functional Dependencies and Normalization
Database Management systems Subject Code: 10CS54 Prepared By:
Database Normalization
Lecture 6: Design Theory
Module 5: Overview of Normalization
Normalization Murali Mani.
Functional Dependencies and Normalization
Multivalued Dependencies & Fourth Normal Form (4NF)
Lecture 2: Database Modeling (end) The Relational Data Model
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Outline: Normalization
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization
Lecture 8: Database Design
Lecture 07: E/R Diagrams and Functional Dependencies
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
CSE544 Data Modeling, Conceptual Design
NORMALIZATION FIRST NORMAL FORM (1NF):
Functional Dependencies
Relational Database Design
Chapter Outline 1 Informal Design Guidelines for Relational Databases
Database Normalization.
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Functional Dependencies Database Functional Dependencies

Designing Good Schemas We know how to create schemas, but ... how do we create good schemas? what does good mean? Schema quality measurements: semantics of the attributes minimal redundancy minimal frequency of null values

Functional Dependences A column Y of relational table R is functionally dependent up on column X of relational table R if and only if: Each value of X in R associated with each value of Y at any given time

Functional Dependencies Definition: A1, ..., Am  B1, ..., Bn holds in R if: t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bm=t’.Bm ) R A1 ... Am B1 Bm t if t, t’ agree here then t, t’ agree here t’

Examples EmpID Name, Phone, Position Position Phone but Phone Position Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

Example Data name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud Because name -> addr Because beersLiked -> manf Because name -> favBeer

Example Reasonable FD's to assert: 1. name  addr Drinkers(name, addr, beersLiked, manf, favoriteBeer) Reasonable FD's to assert: 1. name  addr 2. name  favoriteBeer 3. beersLiked  manf

Functional dependences Y is functional dependent up on X same as values of X identify values of Y If X  Y then XZYZ IF XY and Y  Z then XZ X Y means that Y depend on X or X identify Y

Examples S#  Ename {S#, P#}  Hours If for each value of S#, there are exactly one corresponding value for sname, state, city then: S# Sname Sate City

Example If {S#, p#}  Qty S# P# QTY

Redundancy Example Where’s the redundancy?

Redundancy Example

Example FDs Proper FDs Transitive FDs Partial Key FD Partial Key FDs

Example R = (A, B, C, G, H, I) F = { A  B A  C CG  H CG  I B  H} some members of F+ A  H by transitivity from A  B and B  H AG  I by augmenting A  C with G, to get AG  CG and then transitivity with CG  I CG  HI by augmenting CG  I to infer CG  CGI, and augmenting of CG  H to infer CGI  HI, and then transitivity

Formal definition of a key A key is a set of attributes A1, ..., An s.t. for any other attribute B, A1, ..., An  B A minimal key is a set of attributes which is a key and for which no subset is a key Note: book calls them superkey and key

Where Do Keys Come From? We could simply assert a key K. Then the only FD’s are K -> A for all atributes A, and K turns out to be the only key obtainable from the FD’s. We could assert FD’s and deduce the keys by systematic exploration. E/R gives us FD’s from entity-set keys and many-one relationships.

Examples of Keys Product(name, price, category, color) name, category  price category  color Keys are: {name, category} and all supersets Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class] Keys: {student, room, time}, {student, course} and all supersets

Example 2 Keys are {Lastname, Firstname} and {StudentID} Lastname Firstname Student ID Major Key Key (2 attributes) Superkey Note: There are alternate keys Keys are {Lastname, Firstname} and {StudentID}

Finding the Keys of a Relation Given a relation constructed from an E/R diagram, what is its key? Rules: 1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address name ssn Person Person(address, name, ssn)

Finding the Keys buys(name, ssn, date) Rules: 2. If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)

Finding the Keys Purchase(name , sname, ssn, card-no) Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store CreditCard sname name card-no ssn Purchase(name , sname, ssn, card-no)

Expressing Dependencies Say: “the CreditCard determines the Person” Product sname Purchase name Store Incomplete (what does it say ?) card-no CreditCard Person ssn Purchase(name , sname, ssn, card-no) card-no  name

Enrollment(student, major, course, room, time) course  time What else can we infer ?

Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its FD’s Important also to look at inferred FD’s. Use them to design a better relational schema

Relational Schema Design Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield 908-555-1234 SSN  Name, City, but not SSN  PhoneNumber Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellvue” Deletion anomalies = Fred drops all phone numbers: what is his city ?

Relation Decomposition Break the relation into two: Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 206-555-6543 987-65-4321 908-555-2121 908-555-1234

Relational Schema Design Person buys Product name price ssn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

Decompositions in General R(A1, ..., An) Create two relations R1(B1, ..., Bm) and R2(C1, ..., Cp) such that: B1, ..., Bm  C1, ..., Cp = A1, ..., An and: R1 = projection of R on B1, ..., Bm R2 = projection of R on C1, ..., Cp

Incorrect Decomposition Sometimes it is incorrect: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera DoubleClick 29.99 Decompose on : Name, Category and Price, Category

Incorrect Decomposition Name Category Gizmo Gadget OneClick Camera DoubleClick Price Category 19.99 Gadget 24.99 Camera 29.99 Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera 29.99 DoubleClick When we put it back: Cannot recover information

Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

Normalization 0NF 1NF 2NF 3NF BCNF 4NF 5NF remove multi-valued attributes 1NF 2NF 3NF partial dependencies transitive BCNF 4NF 5NF remove remaining FD anomal dependencies multivalue anomalies

Goals of Normalization Let R be a relation scheme with a set F of functional dependencies. Decide whether a relation scheme R is in “good” form. In the case that a relation scheme R is not in “good” form, decompose it into a set of relation scheme {R1, R2, ..., Rn} such that each relation scheme is in good form the decomposition is a lossless-join decomposition Preferably, the decomposition should be dependency preserving.

1 NF First normal form is NO multi-valued attributes No composite attribute No nested relation We create new table or new field (telephone, visiting)

1NF Normalization Proper translation from ER multi-value attributes will achieve 1NF. Still not a good solution, since we have redundancy in Dnumber and Dmgr_ssn. (This will be handled by 2NF.)

2 NF form Second normal form that if primary key is multiple attribute and non-key attribute depend on part of primary key S# P# Hours Cname pname Loc

2NF Normalization Move the partial key and dependent attributes to a new relation.

Transitive Dependencies X → Y is a transitive dependency (PD) if there exists Z ⊈ any key such that X → Z → Y TDs can cause redundancy if there are multiple values of X that determine the same value of Z the value of Y for that value of Z is stored multiple times 3NF normalization: move (Z,Y) to new relation in which Z is the primary key

3 NF The relation in 3NF if it is 2 NF and every non-key attribute is non-transitively dependent on primary key

3NF Normalization Create new relation to hold the attributes in the transitive FD. LHS of transitive FD becomes PK of new relation.

Transitive Dependency Example DEPT COURSE SECTION ROOM INSTR I_OFFICE I_OFFICE (instructor's office) is determined by the non-PK attribute INSTR DEPT COURSE SECTION COMP 51 1 2 163 53 ROOM WPC122 WPC219 WPC130 INSTR DOHERTY CLIBURN BOWRING CARMAN I_OFFICE CSB109 CSB107 CSB108 CSB104

NF Decomposition: Foreign Keys DEPT COURSE SECTION ROOM INSTR I_OFFICE DEPT COURSE SECTION ROOM INSTR Decomposition: INSTR I_OFFICE

3NF Example Relation dept_advisor: dept_advisor (s_ID, i_ID, dept_name) F = {s_ID, dept_name  i_ID, i_ID  dept_name} Two candidate keys: s_ID, dept_name, and i_ID, s_ID R is in 3NF s_ID, dept_name  i_ID s_ID dept_name is a superkey i_ID  dept_name dept_name is contained in a candidate key

Redundancy in 3NF There is some redundancy in this schema Example of problems due to redundancy in 3NF R = (J, K, L) F = {JK  L, L  K } J L K j1 j2 j3 null l1 l2 k1 k2 repetition of information (e.g., the relationship l1, k1) (i_ID, dept_name) need to use null values (e.g., to represent the relationship l2, k2 where there is no corresponding value for J). (i_ID, dept_nameI) if there is no separate relation mapping instructors to departments

3NF Decomposition: An Example Relation schema: cust_banker_branch = (customer_id, employee_id, branch_name, type ) The functional dependencies for this relation schema are: customer_id, employee_id  branch_name, type employee_id  branch_name customer_id, branch_name  employee_id We first compute a canonical cover branch_name is extraneous in the r.h.s. of the 1st dependency No other attribute is extraneous, so we get FC = customer_id, employee_id  type employee_id  branch_name customer_id, branch_name  employee_id

Why? Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD for R, say X  A, then X is a superkey. “Nontrivial” = right-side attribute not in left side. Why? 1. Guarantees no redundancy due to FD’s. 2. Guarantees no update anomalies = one occurrence of a fact is updated, not all. 3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted.

Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form   where   R and   R, at least one of the following holds:    is trivial (i.e.,   )  is a superkey for R Example schema not in BCNF: instr_dept (ID, name, salary, dept_name, building, budget ) because dept_name building, budget holds on instr_dept, but dept_name is not a superkey

Third Normal Form A relation schema R is in third normal form (3NF) if for all:    in F+ at least one of the following holds:    is trivial (i.e.,   )  is a superkey for R Each attribute A in  –  is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation (will see why later).

Boyce-Codd Normal Form Sample data for Course Section table Because Prefix  Department, we know that (Prefix, Num, SecNum) could also be a primary key for this table. Department Prefix Num SecNum CourseName Instructor Mathematics Math 101 1 Algebra I Al Jeebra 2 201 Calculus I Kal Kuelus Philosophy Phil Greek Thought Arie Stottle 202 Euro Thought Mike Angelo Marketing Mktg 410 Marketing Strategy Marc Ekking SpMkg 401 Advanced Sports Marketing Hulk Hogan

Example Students(name, addr, phones, CarLiked) A student’s phones are independent of the cars they like. Thus, each of a student’s phones appears with each of the cars they like in all combinations. This repetition is unlike redundancy due to FD’s, of which name->addr is the only one.

Example Only key is {name, CarsLiked}. Students(name, addr, CarLiked, manf, favCar) FD’s: name->addr favCar, carsLiked->manf Only key is {name, CarsLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Students is not in BCNF

Boyce-Codd Normal Form We say a relation R is in BCNF if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means A is not a member of set X. Remember, a superkey is any superset of a key (not necessarily a proper superset).

Example Students(name, addr, CarsLiked, manf, favCar) F = name->addr, name -> favCar, CarsLiked->manf Pick BCNF violation name->addr. Close the left side: {name}+ = {name, addr, favCar}. Decomposed relations: Students1(name, addr, favCar) Students2(name, CarsLiked, manf)

3NF and BCNF 3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation. X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

Exercises The following relation schema is not in third normal form (3NF). Is this an example of a transitive dependency or a partial key dependency? Give an equivalent schema that is in 3NF. SID FROM_CITY TO_CITY DISTANCE SHIPMENT WEIGHT

Exercises This relation has been proposed to track Pacific alumni: Alumni( SID, LastName, FirstName, Degree, YearAwarded, Phone). Pacific allows students to receive multiple degrees, possibly in different years. Identify all FDs. Give a new schema that is in third normal form.

Exercises Consider the following relation schema: Movie(title, genre, length, actor, sag_id, studio, studio_addr)   Every movie has a unique title. A movie may have multiple actors. Each actor has a unique sag_id. An actor may appear in multiple movies. A movie has exactly one studio, but a studio may produce more than one movie. Each studio has exactly one address. Identify all functional dependencies. Normalize the schema to 3NF.

INDEX Is used to speed up the retrieval of records in response to certain search conditions Any field of the file can be used to create an index

Index Multiple indexes on different fields can be constructed on same file. Is specified on the ordered key field of file (single index) and B+ tree (multiple indexes)

Primary index It has 2 fields: Primary key of the data file Pointer to a disk block (address)

Index problem The main problem with primary index is insertion and deletion of records To insert a record in its correct position, other records be shifted to give space for new one.

Clustering index It based on a non-key field in the file where the record value can be repeated so it clustering into groups The record insertion and deletion still cause a problem

Clustering index The primary index requires a distinct value for each record In clustering index, there is one entry for each distinct value

Secondary index It based on some non-ordering field of the data file. There can be many secondary indexes for same file

Example Create a database for managing class enrollments in a single semester. You should keep track of all students (their names, Ids, and addresses) and professors (name, Id, department). Do not record the address of professors but keep track of their ages. Maintain records of courses also. Like what classroom is assigned to a course, what is the current enrollment, and which department offers it. At most one professor teaches each course. Each student evaluates the professor teaching the course. Note that all course offerings in the semester are unique, i.e. course names and numbers do not overlap. A course can have ≥ 0 pre-requisites, excluding itself. A student enrolled in a course must have enrolled in all its pre-requisites. Each student receives a grade in each course. The departments are also unique, and can have at most one chairperson (or dept. head). A chairperson is not allowed to head two or more departments.

Example Create a database for managing class enrollments in a single semester. You should keep track of all students (their names, Ids, and addresses) and professors (name, Id, department). Do not record the address of professors but keep track of their ages. Maintain records of courses also. Like what classroom is assigned to a course, what is the current enrollment, and which department offers it. At most one professor teaches each course. Each student evaluates the professor teaching the course. Note that all course offerings in the semester are unique, i.e. course names and numbers do not overlap. A course can have ≥ 0 pre-requisites, excluding itself. A student enrolled in a course must have enrolled in all its pre-requisites. Each student receives a grade in each course. The departments are also unique, and can have at most one chairperson (or dept. head). A chairperson is not allowed to head two or more departments.