Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.

Slides:



Advertisements
Similar presentations
Lecture 07: Relational Algebra
Advertisements

1 Relational Algebra. Motivation Write a Java program Translate it into a program in assembly language Execute the assembly language program As a rough.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Lecture 6: Design Constraints and Functional Dependencies January 21st, 2004.
1 Lecture 12: SQL Friday, October 26, Outline Simple Queries in SQL (5.1) Queries with more than one relation (5.2) Subqueries (5.3) Duplicates.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Lecture 06 The Relational Data Model. 2 Outline Relational Data Model Functional Dependencies FDs in ER Logical Schema Design Reading Chapter 8.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
M.P. Johnson, DBMS, Stern/NYU, Spring C : Database Management Systems Lecture #7 Matthew P. Johnson Stern School of Business, NYU Spring,
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Complex Queries (1) Product ( pname, price, category, maker)
Relational Schema Design (end) Relational Algebra Finally, querying the database!
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
Integrity Constraints An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them. Knowledge of integrity.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
E/R Diagrams and Functional Dependencies. Modeling Subclasses The world is inherently hierarchical. Some entities are special cases of others We need.
1 Schema Design & Refinement (aka Normalization).
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Tallahassee, Florida, 2015 COP4710 Database Systems Relational Design Fall 2015.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
1 Lecture 10: Database Design Wednesday, January 26, 2005.
Functional Dependencies and Relational Schema Design.
Lecture 3: Conceptual Database Design and Schema Design April 12 th, 2004.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Final Review Zaki Malik November 20, Basic Operators Covered.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Relational Algebra.
Relational Algebra at a Glance
Lecture 8: Relational Algebra
3.1 Functional Dependencies
Problems in Designing Schema
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Functional Dependencies
Lecture 8: Database Design
Lecture 33: The Relational Model 2
Where are we? Until now: Modeling databases (ODL, E/R): all about the schema Now: Manipulating the data: queries, updates, SQL Then: looking inside -
Lecture 07: E/R Diagrams and Functional Dependencies
Functional Dependencies
CSE544 Data Modeling, Conceptual Design
Functional Dependencies
Terminology Product Attribute names Name Price Category Manufacturer
Relational Schema Design (end) Relational Algebra SQL (maybe)
Syllabus Introduction Website Management Systems
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Lecture 09: Functional Dependencies
Presentation transcript:

Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000

Administration Homework #1 due today. Project descriptions & groups due today. Homework #2 available today. Exam date is looking like December 7 th –Complaints? Projects: tell us if you need to use the lab.

Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally: A, A, … A 12n B, B, … B 12m Motivating example for the study of functional dependencies: Name Social Security Number Phone Number

Examples EmpID Name, Phone, Position Position Phone but Phone Position

In General To check A B, erase all other columns check if the remaining relation is many-one (called functional in mathematics)

Example

More Examples Product: name price, manufacturer Person: ssn name, age Company: name stock price, president Key of a relation is a set of attributes that: - functionally determines all the attributes of the relation - none of its subsets determines all the attributes. Superkey: a set of attributes that contains a key.

Finding the Keys of a Relation Given a relation constructed from an E/R diagram, what is its key? Rules: 1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. address namessn Person

Rules for Binary Relationships Several cases are possible for a binary relationship E1 - E2: 1. Many-many: the key includes the key of E1 together with the key of E2. What happens for: 2. Many-one: 3. One-one: Person buys Product name pricenamessn

Keys in Multiway Relationships If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Purchase Product Person Store Payment Method

Rules in FD’s A, A, … A 12n B, B, … B 12m A, A, … A 12n 1 Is equivalent to B A, A, … A 12n 2 B 12n m B … Splitting rule and Combing rule Splitting/Combining Rule:

Rules in FD’s (continued) A, A, … A 12n i A Always holds. Trivial Dependency Why ?

Rules in FD’s (continued) A, A, … A 12n Transitive Closure Rule: B, B, … B 12m A, A, … A 12n 1 B, B …, B 2m1 C, C …, C 2p1 2p If and then Why ?

Closure of a set of Attributes Given a set of attributes {A1, …, An} and a set of dependencies S. Problem: find all attributes B such that: any relation which satisfies S also satisfies: A1, …, An B The closure of {A1, …, An}, denoted {A1, …, An}, is the set of all such attributes B +

Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if is in S, and C is not in X then add C to X. B, B, … B 12n C 12 n are all in X, and

Example A B C A D E B D A F B Closure of {A,B}: X = {A, B, } Closure of {A, F}: X = {A, F, }

Why Is the Algorithm Correct ? Show the following by induction: –For every B in X: A1, …, An B Initially X = {A1, …, An} -- holds Induction step: B1, …, Bm in X –Implies A1, …, An B1, …, Bm –We also have B1, …, Bm C –By transitivity we have A1, …, An C This shows that the algorithm is sound; need to show it is complete

Relational Schema Design Main idea: Start with some relational schema Find out its FD’s Use them to design a better relational schema

Relational Schema Design Name SSN Phone Number Fred (201) Fred (206) Joe (908) Joe (212) Problems: - redundancy - update anomalies - deletion anomalies Recall set attributes (persons with several phones): Note: SSN is NOT a key here

Relation Decomposition SSN Name Fred Joe SSN Phone Number (201) (206) (908) (212) Break the relation into two:

Decompositions in General A, A, … A 12n Let R be a relation with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such that: B, B, … B 12m C, C, … C 12l  A, A, … A 12n And -- R1 is the projection of R on -- R2 is the projection of R on B, B, … B 12m C, C, … C 12l

Incorrect Decomposition Sometimes it is incorrect: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera DoubleClick29.99Camera Decompose on : Name, Category and Price, Category

Incorrect Decomposition NameCategory GizmoGadget OneClickCamera DoubleClickCamera PriceCategory 19.99Gadget 24.99Camera 29.99Camera NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera OneClick29.99Camera DoubleClick24.99Camera DoubleClick29.99Camera When we put it back: Cannot recover information

Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if and only if: Whenever there is a nontrivial dependency for R, it is the case that { } a super-key for R. A, A, … A 12n B 12n In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

BCNF Decomposition Find a dependency that violates the BCNF condition: A, A, … A 12n B, B, … B 12m A’s Others B’s R1R2 Heuristic: choose B, B, … B “as large as possible” 12m Decompose: Find a 2-attribute relation that is not in BCNF. Continue until there are no BCNF violations left.

Example Decomposition Name SSN Age EyeColor PhoneNumber Functional dependencies: SSN Name, Age, Eye Color What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy Person: BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber)

Other Example R(A,B,C,D) A B, B C Key: A, D Violations of BCNF: A B, A C, A BC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first ?

Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) = R(A,B,C) R’ is in general larger than R. Must ensure R’ = R

Decomposition Based on BCNF is Necessarily Lossless Attributes A, B, C. FD: A C Relations R1(A,B) R2(A,C) Tuple in R: (a,b,c) Tuples in R1: (a,b), (a,b’) Tuples in R2: (a,c), (a,c’) Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?

Example Name SSN Phone Number Fred (201) Fred (206) Joe (908) Joe (212) What are the dependencies? What are the keys? Is it in BCNF?

And Now? SSN Name Fred Joe SSN Phone Number (201) (206) (908) (212)

3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit -> Company; Company, Product -> Unit So, there is a BCNF violation, and we decompose. Unit Company No FDs

So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if and only if: Whenever there is a nontrivial dependency for R, it is the case that { } a super-key for R, or B is part of a key. A, A, … A 12n B 12n What happened to first and second normal forms? Will we have more normal forms?

Multi-valued Dependencies SSN Phone Number Course (206) CSE (206) CSE (206) CSE (206) CSE-341 The multi-valued dependencies are: SSN Phone Number SSN Course

Definition of Multi-valued Dependecy Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp

Definition of MVDs Continued Equivalently: the decomposition into R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp) is lossless Note: an MVD A1,…,An B1,…,Bm Implicitly talks about “the other” attributes C1,…Cp

Rules for MVDs If A1,…An B1,…,Bm then A1,…,An B1,…,Bm Other rules in the book

4 th Normal Form (4NF) R is in 4NF if whenever: A1,…,An B1,…,Bm is a nontrivial MVD, then A1,…,An is a superkey Same as BCNF with FDs replaced by MVDs

Confused by Normal Forms ? 3NF BCNF 4NF In practice: (1) 3NF is enough, (2) don’t overdo it !

Querying the Database How do we specify what we want from our database? Find all the employees who earn more than $50,000 and pay taxes in New Jersey. We design high-level query languages: –SQL (used everywhere) –Datalog (used by database theoreticians, their students, friends and family) Relational algebra: a basic set of operations on relations that provide the basic principles.

Relational Algebra at a Glance Operators: relations as input, new relation as output Five basic RA operators: –Basic Set Operators union, difference (no intersection, no complement) –Selection:  –Projection:  –Cartesian Product: X Derived operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) When our relations have attribute names: –Renaming: 

Set Operations Binary operations Union: all tuples in R1 or R2 –R1 U R2 –Example: ActiveEmployees U RetiredEmployees Difference: all tuples in R1 and not in R2 –R1 – R2 –Example AllEmployees - RetiredEmployees

Selection Unary operation: returns a subset of the tuples which satisfy some condition Notation:  (R) c is a condition: –=,, and, or, not Find all employees with salary more than $40,000: –  (Employee) c Salary > 40000

Find all employees with salary more than $40,000.

Projection Unary operation: returns certain columns Eliminates duplicate tuples ! Notation:  (R) Example: project social-security number and names: –  (Employee) A1,…,An SSN, Name

Cartesian Product Binary Operation Result is tuples combining any element of R1 with any element of R2, for R1 X R2 Schema is union of Schema(R1) & Schema(R2) Notation: R1 x R2 Example: Employee x Dependents Very rare in practice; but joins are very common.

Join (Natural) Most important, expensive and exciting. Combines two relations, selecting only related tuples Equivalent to a cross product followed by selection Resulting schema has all attributes of the two relations, but one copy of join condition attributes

Other Joins and Renaming Theta join: the join involves a predicate –R S Semi-join: the attributes of one relation are included in the other. Renaming:

Complex Queries Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city) Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought

Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Operations on Bags (and why we care) Basic operations: Projection Selection Union Intersection Set difference Cartesian product Join (natural join, theta join)