Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.

Similar presentations


Presentation on theme: "Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000."— Presentation transcript:

1 Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

2 Administration Homework #1 due today. Project descriptions & groups due today. Homework #2 available today. Exam date is looking like December 7 th –Complaints? Projects: tell us if you need to use the lab.

3 Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally: A, A, … A 12n B, B, … B 12m Motivating example for the study of functional dependencies: Name Social Security Number Phone Number

4 Examples EmpID Name, Phone, Position Position Phone but Phone Position

5 In General To check A B, erase all other columns check if the remaining relation is many-one (called functional in mathematics)

6 Example

7 More Examples Product: name price, manufacturer Person: ssn name, age Company: name stock price, president Key of a relation is a set of attributes that: - functionally determines all the attributes of the relation - none of its subsets determines all the attributes. Superkey: a set of attributes that contains a key.

8 Finding the Keys of a Relation Given a relation constructed from an E/R diagram, what is its key? Rules: 1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address namessn Person

9 Rules for Binary Relationships Several cases are possible for a binary relationship E1 - E2: 1. Many-many: the key includes the key of E1 together with the key of E2. What happens for: 2. Many-one: 3. One-one: Person buys Product name pricenamessn

10 Keys in Multiway Relationships If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store Payment Method

11 Rules in FD’s A, A, … A 12n B, B, … B 12m A, A, … A 12n 1 Is equivalent to B A, A, … A 12n 2 B 12n m B … Splitting rule and Combing rule Splitting/Combining Rule:

12 Rules in FD’s (continued) A, A, … A 12n i A Always holds. Trivial Dependency Why ?

13 Rules in FD’s (continued) A, A, … A 12n Transitive Closure Rule: B, B, … B 12m A, A, … A 12n 1 B, B …, B 2m1 C, C …, C 2p1 2p If and then Why ?

14 Closure of a set of Attributes Given a set of attributes {A1, …, An} and a set of dependencies S. Problem: find all attributes B such that: any relation which satisfies S also satisfies: A1, …, An B The closure of {A1, …, An}, denoted {A1, …, An}, is the set of all such attributes B +

15 Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if is in S, and C is not in X then add C to X. B, B, … B 12n C 12 n are all in X, and

16 Example A B C A D E B D A F B Closure of {A,B}: X = {A, B, } Closure of {A, F}: X = {A, F, }

17 Why Is the Algorithm Correct ? Show the following by induction: –For every B in X: A1, …, An B Initially X = {A1, …, An} -- holds Induction step: B1, …, Bm in X –Implies A1, …, An B1, …, Bm –We also have B1, …, Bm C –By transitivity we have A1, …, An C This shows that the algorithm is sound; need to show it is complete

18 Relational Schema Design Main idea: Start with some relational schema Find out its FD’s Use them to design a better relational schema

19 Relational Schema Design Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 Problems: - redundancy - update anomalies - deletion anomalies Recall set attributes (persons with several phones): Note: SSN is NOT a key here

20 Relation Decomposition SSN Name 123-321-99 Fred 909-438-44 Joe SSN Phone Number 123-321-99 (201) 555-1234 123-321-99 (206) 572-4312 909-438-44 (908) 464-0028 909-438-44 (212) 555-4000 Break the relation into two:

21 Decompositions in General A, A, … A 12n Let R be a relation with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such that: B, B, … B 12m C, C, … C 12l  A, A, … A 12n And -- R1 is the projection of R on -- R2 is the projection of R on B, B, … B 12m C, C, … C 12l

22 Incorrect Decomposition Sometimes it is incorrect: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera DoubleClick29.99Camera Decompose on : Name, Category and Price, Category

23 Incorrect Decomposition NameCategory GizmoGadget OneClickCamera DoubleClickCamera PriceCategory 19.99Gadget 24.99Camera 29.99Camera NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera OneClick29.99Camera DoubleClick24.99Camera DoubleClick29.99Camera When we put it back: Cannot recover information

24 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if and only if: Whenever there is a nontrivial dependency for R, it is the case that { } a super-key for R. A, A, … A 12n B 12n In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

25 BCNF Decomposition Find a dependency that violates the BCNF condition: A, A, … A 12n B, B, … B 12m A’s Others B’s R1R2 Heuristic: choose B, B, … B “as large as possible” 12m Decompose: Find a 2-attribute relation that is not in BCNF. Continue until there are no BCNF violations left.

26 Example Decomposition Name SSN Age EyeColor PhoneNumber Functional dependencies: SSN Name, Age, Eye Color What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy Person: BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber)

27 Other Example R(A,B,C,D) A B, B C Key: A, D Violations of BCNF: A B, A C, A BC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first ?

28 Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) = R(A,B,C) R’ is in general larger than R. Must ensure R’ = R

29 Decomposition Based on BCNF is Necessarily Lossless Attributes A, B, C. FD: A C Relations R1(A,B) R2(A,C) Tuple in R: (a,b,c) Tuples in R1: (a,b), (a,b’) Tuples in R2: (a,c), (a,c’) Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?

30 Example Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 What are the dependencies? What are the keys? Is it in BCNF?

31 And Now? SSN Name 123-321-99 Fred 909-438-44 Joe SSN Phone Number 123-321-99 (201) 555-1234 123-321-99 (206) 572-4312 909-438-44 (908) 464-0028 909-438-44 (212) 555-4000

32 3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit -> Company; Company, Product -> Unit So, there is a BCNF violation, and we decompose. Unit Company No FDs

33 So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

34 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if and only if: Whenever there is a nontrivial dependency for R, it is the case that { } a super-key for R, or B is part of a key. A, A, … A 12n B 12n What happened to first and second normal forms? Will we have more normal forms?

35 Multi-valued Dependencies SSN Phone Number Course 123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341 123-321-99 (206) 432-8954 CSE-444 123-321-99 (206) 432-8954 CSE-341 The multi-valued dependencies are: SSN Phone Number SSN Course

36 Definition of Multi-valued Dependecy Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp

37 Definition of MVDs Continued Equivalently: the decomposition into R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp) is lossless Note: an MVD A1,…,An B1,…,Bm Implicitly talks about “the other” attributes C1,…Cp

38 Rules for MVDs If A1,…An B1,…,Bm then A1,…,An B1,…,Bm Other rules in the book

39 4 th Normal Form (4NF) R is in 4NF if whenever: A1,…,An B1,…,Bm is a nontrivial MVD, then A1,…,An is a superkey Same as BCNF with FDs replaced by MVDs

40 Confused by Normal Forms ? 3NF BCNF 4NF In practice: (1) 3NF is enough, (2) don’t overdo it !

41 Querying the Database How do we specify what we want from our database? Find all the employees who earn more than $50,000 and pay taxes in New Jersey. We design high-level query languages: –SQL (used everywhere) –Datalog (used by database theoreticians, their students, friends and family) Relational algebra: a basic set of operations on relations that provide the basic principles.

42 Relational Algebra at a Glance Operators: relations as input, new relation as output Five basic RA operators: –Basic Set Operators union, difference (no intersection, no complement) –Selection:  –Projection:  –Cartesian Product: X Derived operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) When our relations have attribute names: –Renaming: 

43 Set Operations Binary operations Union: all tuples in R1 or R2 –R1 U R2 –Example: ActiveEmployees U RetiredEmployees Difference: all tuples in R1 and not in R2 –R1 – R2 –Example AllEmployees - RetiredEmployees

44 Selection Unary operation: returns a subset of the tuples which satisfy some condition Notation:  (R) c is a condition: –=,, and, or, not Find all employees with salary more than $40,000: –  (Employee) c Salary > 40000

45 Find all employees with salary more than $40,000.

46 Projection Unary operation: returns certain columns Eliminates duplicate tuples ! Notation:  (R) Example: project social-security number and names: –  (Employee) A1,…,An SSN, Name

47

48 Cartesian Product Binary Operation Result is tuples combining any element of R1 with any element of R2, for R1 X R2 Schema is union of Schema(R1) & Schema(R2) Notation: R1 x R2 Example: Employee x Dependents Very rare in practice; but joins are very common.

49

50 Join (Natural) Most important, expensive and exciting. Combines two relations, selecting only related tuples Equivalent to a cross product followed by selection Resulting schema has all attributes of the two relations, but one copy of join condition attributes

51

52 Other Joins and Renaming Theta join: the join involves a predicate –R S Semi-join: the attributes of one relation are included in the other. Renaming:

53 Complex Queries Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city) Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought

54 Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

55 Operations on Bags (and why we care) Basic operations: Projection Selection Union Intersection Set difference Cartesian product Join (natural join, theta join)


Download ppt "Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000."

Similar presentations


Ads by Google