Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement: Canonical/minimal Covers
Functional Dependencies and Normalization for Relational Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Normalization Decomposition techniques for ensuring: Lossless joins Dependency preservation Redundancy avoidance We will look at some normal forms: Boyce-Codd.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
4NF and 5NF Prof. Sin-Min Lee Department of Computer Science.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Relational Normalization Theory. 2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
Decomposition By Yuhung Chen CS157A Section 2 October
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Multivalued Dependency Tamer Abuelata. Introduction Goal in Databases: Goal in Databases: BCNF (Boyce Codd Normal Form) BCNF (Boyce Codd Normal Form)
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Multi-valued Dependencies and Fourth Normal Form
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
1 Functional Dependencies and Normalization Chapter 15.
4NF (Multivalued Dependency), and 5NF (Join Dependency)
Third Normal Form (3NF) Zaki Malik October 23, 2008.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.
3 Spring Chapter Normalization of Database Tables.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
Functional Dependency and Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Module 5: Overview of Database Design -- Normalization
Schedule Today: Jan. 23 (wed) Week of Jan 28
Relational Database Design
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Module 5: Overview of Normalization
Chapter 7: Relational Database Design
Normalization Murali Mani.
Multivalued Dependencies & Fourth Normal Form (4NF)
Lecture 07: E/R Diagrams and Functional Dependencies
Relational Database Design
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
Chapter 7a: Overview of Database Design -- Normalization
CS4222 Principles of Database System
Presentation transcript:

Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee

Purpose of Normalization To reduce the chances for anomalies to occur in a database. normalization prevents the possible corruption of databases stemming from what are called "insertion anomalies," "deletion anomalies," and "update anomalies."

Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) First normal form (1NF) is the same as the definition of relational model (relations = sets of tuples; each tuple = sequence of atomic values) Second normal form (2NF) – a research lab accident; has no practical or theoretical value – won’t discuss The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

BCNF Example: Person1(SSN, Name, Address) Definition: A relation schema R is in BCNF if for every FD X Y associated with R either Y  X (i.e., the FD is trivial) or X is a superkey of R Example: Person1(SSN, Name, Address) The only FD is SSN  Name, Address Since SSN is a key, Person1 is in BCNF

(non) BCNF Examples Person (SSN, Name, Address, Hobby) The FD SSN  Name, Address does not satisfy requirements of BCNF since the key is (SSN, Hobby) HasAccount (AcctNum, ClientId, OfficeId) The FD AcctNum OfficeId does not satisfy BCNF requirements since keys are (ClientId, OfficeId) and (AcctNum, ClientId); not AcctNum.

Third Normal Form A relational schema R is in 3NF if for every FD X Y associated with R either: Y  X (i.e., the FD is trivial); or X is a superkey of R; or Every A Y is part of some key of R There is no X Y for non-prime attributes X,Y. 3NF is weaker than BCNF (every schema that is in BCNF is also in 3NF) BCNF conditions

3NF Example HasAccount (AcctNum, ClientId, OfficeId) ClientId, OfficeId  AcctNum OK since LHS contains a key AcctNum  OfficeId OK since RHS is part of a key HasAccount is in 3NF but it might still contain redundant information due to AcctNum  OfficeId (which is not allowed by BCNF)

Example R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5) FD1: A1  A3 A5 FD2: A5  A1 A4 FD3: A3 A4  A2

Example (con’t) A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD1: A1  A3 A5 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD1: A1  A3 A5 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD2: A5  A1 A4 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) FD2: A5  A1 A4 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) a(4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 a(1) b(3,2) b(3,3) a(4) a(5)

FD1. AC, FD2. BC, FD3. CD FD4. D, EC, FD5. C,E A,

R(A,B,C,D,E) A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b23 b24 R3(B,E) b31 b33 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b53 b54

FD1: A->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b24 b25 R3(B,E) b31 b33 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b54

FD2: B->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b24 b25 R3(B,E) b31 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b54

FD3: C->D A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25 R3(B,E) b31 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52

FD4: D,E->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25 R3(B,E) b31 a3 a5 R4(C,D,E) b41 b42 R5(A,E) b52

FD5: C,E->A A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25 R3(B,E) a3 a5 R4(C,D,E) b42 b44 b45 R5(A,E) b52

It is Lossless A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25 R3(B,E) a3 a5 R4(C,D,E) b42 b44 b45 R5(A,E) b52

Multivalued Dependencies (cont) t1[ a ] = t2[ a ] = t3[ a ] = t4[ a ] t3[ b ] = t1[ b ] t3[ R - b ] = t2[ R - b ] t4[ b ] = t2[ b ] t4[ R - b ] = t1[ R - b ] The multivalued dependency a  b says that the relationship between a and b is independent of the relationship between a and R - b.

Multivalued Dependencies (cont) If the multivalued dependency a  b is satisfied by all relations on schema R, then a  b is a trivial multivalued dependency on schema R. Thus, a  b is trivial if b C a or b 4 a = R Tabular representation of a  b a b R - a - b t1 a1…ai ai+1…aj aj+1…an t2 bi+1…bj bj+1…bn t3 t4

Multivalued Dependencies (cont) To illustrate the difference between functional and multivalued dependencies, we consider again the BC-schema. Graph 1 loan-number customer-name customer-street customer-city L-23 Smith North Rye Main Manchester L-93 Curry Lake Horseneck

Multivalued Dependencies (cont) On graph 1, we must repeat the loan number once for each address a customer has, and we must repeat the address for each loan a customer has. This repetition is unnecessary, since the relationship between that customer and his address is independent of the relationship between that customer and a loan. If a customer (say, Smith) has a loan (say, loan number L-23), we want that loan to be associated with all Smith’s addresses.

Multivalued Dependencies (cont) The relation on graph 2 is illegal, therefore to make this relation legal, we need to add the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, North, Rye) to the bc relation of graph 2. Graph 2 (an illegal bc relation) loan-number customer-name customer-street customer-city L-23 Smith North Rye L-27 Main Manchester

Multivalued Dependencies (cont) Comparing the preceding example with our definition of multivalued dependency, we see that we want the multivalued dependency to hold. customer-name  customer-street customer-city As was the case for functional dependencies, we shall use multivalued dependencies in two ways: 1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies. 2. To specify constraints on the set of legal relations; we shall thus concern ourselves with only those relations that specify a given set of functional and multivalued dependencies.

3NF One FD structure causes problems: If you decompose, you can’t check all the FD’s only in the decomposed relations. If you don’t decompose, you violate BCNF. Abstractly: AB  C and C  B. Example 1: title city  theatre and theatre  city. Example 2: street city  zip, zip  city. Keys: {A, B} and {A, C}, but C  B has a left side that is not a superkey. Suggests decomposition into BC and AC. But you can’t check the FD AB  C in only these relations.

Multivalued Dependencies The multivalued dependency X  Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R. X Y others

Example Drinkers(name, addr, phones, beersLiked) with MVD Name  phones. If Drinkers has the two tuples: name addr phones beersLiked sue a p1 b1 sue a p2 b2 it must also have the same tuples with phones components swapped: sue a p2 b1 sue a p1 b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

Dependency Preservation Let F’ = F1 È F2 È …. È Fn. F’ is a set of functional dependencies on schema R, but, in general, F’ ¹ F.

Dependency Preservation A decomposition having the property F’+ = F+ is a dependency-preserving decomposition.

(1)    Normal forms are (a)    classifications of relations based on the types of modification anomalies to which they are vulnerable. (b)    Techniques for preventing anomalies. (c)    Both (a) and (b). (d)    None of the above. Answer:  

(2) Given a relation schema and associated functional dependencies, it is always possible to a)       find a dependency preserving decomposition of the relation into BCNF b)       find a lossless join decomposition of the relation into BCNF c)       both (a) and (b) d)       none of the above Answer:

) Given relation schema R(A,B,C,D) with FDs F = {ABC; BCD; AB}, then which of the following statements is true? a)       BC is a member of F+ b)       ABCD is a member of F+ c)       CDCD is a member of F+ d)       Both (b) and (c)

(4) Given the relation schema R(A,B,C) and functional dependencies F = {AB C, BA; CB }. Which attribute(s) are prime, i.e. part of a candidate key? a)     only A b)     only B c)     A and B d)     B and C

(5) Given the relation schema R(A,B,C) and functional dependencies F = {AB, BC, ACB}. What is the result of using the Relational database design algorithm for producing a database schema which is dependency preserving and has the lossless join property for relations in 3rd normal form? a)       R1(A,B), R2(B,C) and R3(A,C,B) b)       R1(A,B) and R2(A,C) c)       R1(A,B) and R2(B,C) d)       none of the above

(2) Which of the following are informal design guidelines for relational schema? (a)Reduce the redundant values in tuples (b)Reduce the null values in tuples (c )Disallow the potential for generating spurious tuples (d)All of the above Answer:

(3) Given the relation schema, DeptSales(DeptNo, Dname, Month, Year, Sales) and the set of functional dependencies, F = {DeptNoDname; DeptNo,Month,YearSales}, then which of the following functional dependencies is a valid inference? (a)DeptNoSales (b)DeptNo,Month,YearDname (c)DnameSales (d)None of the above Answer:

(5) Given relation schema R(A,B,C,D) with FDs F = {ABC; BCD; AB}, then which of the following statements is true? BC is a member of F+ ABCD is a member of F+ CDCD is a member of F+ Both (b) and (c) Answer:

Q2. (1 mark) Imagine that we have the relation R (ABCDE) with FD1. ABC, FD2. ABD, FD3. C,DA,B. (a)Decide whether this relation is in 3NF.   Answer: yes (b)Is it in in BCNF? Answer: no

Q3. (1 mark) What is the higest normal form of this table Q3.(1 mark) What is the higest normal form of this table? Draw the functional dependencygraph P F B * Prime attributes: P, F, B Thus R is 3NF

Q4. (1 mark) Given a transaction table D, find the support and the confidence for an association rule B,D  E   Answer: Support = confidence =

Q5.(1 mark) Use the Prim’s algorithm find the minimum spanning tree step by step . Start from s

Q2. (1 mark) Suppose Prim's minimum-spanning tree algorithm is applied to this graph, starting with node C . List the edges that are chosen for the MST, in the order they are chosen.

Q6. (1 mark) Give a relation schema R(A,B,C,D) and functional dependencies F , such that the Table is 3NF but not BCNF. Answer:  

Q5. (1 mark) Consider a relation R(A,B,C,D) Q5.(1 mark) Consider a relation R(A,B,C,D). which contains the following four tuples: A B C D ******** 1 2 3 1 1 2 4 1 1 2 3 3 5 1 6 2 f we know MVD : AC>D , how many tuples (including the above 4 tuples) at least R must have ?

Give an example of a table which is 3NF but not BCNF Example 1: F1. ABCD F2. AB CD Wrong: it is BCNF Example 2: F1. AB C F2. AB D F3. C D Wrong: Example 3: F1. ABC D F2. B C Wrong Example 4: F1. A C F2. B D

Example 5: F1. ABC D F2. D B F3. B D Wrong Example 6: F1. A BC

Q8. (2 marks) Use the Apriori algorithm, find the frequent-item sets Q8. (2 marks) Use the Apriori algorithm, find the frequent-item sets . Show all your steps.  

Q4. (1 mark) Given the following data set Find the support and confidence of 3 2. Solution:

Q8. (2 marks) Use the Apriori algorithm find the frequent itemsets for the following transaction table with min-supp = 40%. Solution : 5x40% =2

Item-set Q8. L1 a1 a2 a3 a4 a5 support 2 3 1 a1 a2 a1 a3 a1 a5 a2 a3 Fequent-item set ={a1, a2, a3, a5, a1a3, a2a3, a2a5, a3a5, a2a3a5}