Cse 344 May 18th – Loss and views.

Slides:



Advertisements
Similar presentations
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Advertisements

Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
1 Introduction to Database Systems CSE 444 Lecture 05: Views, Constraints April 9, 2008.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
1 Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design April 16 & 18, 2008.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Functional Dependencies and Relational Schema Design.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Lecture 11: Functional Dependencies
Functional Dependency and Normalization
Advanced Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
COP4710 Database Systems Relational Design.
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Relational Database Design by Dr. S. Sridhar, Ph. D
Schedule Today: Jan. 23 (wed) Week of Jan 28
CS 480: Database Systems Lecture 22 March 6, 2013.
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Problems in Designing Schema
Functional Dependencies and Normalization
Schema Refinement and Normalization
Schema Refinement and Normalization
Lecture 6: Design Theory
Module 5: Overview of Normalization
Normalization Murali Mani.
Functional Dependencies and Normalization
Cse 344 May 11th – Entities.
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Normalization Part II cs3431.
Lecture 05 Views, Constraints
Schema Refinement and Normalization
Lecture 8: Database Design
Lectures 12: Design Theory I
Lecture 19a The Chase Test for Lossless Join
Lecture 07: E/R Diagrams and Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
CSE544 Data Modeling, Conceptual Design
Lecture 06: SQL Monday, October 11, 2004.
Chapter 19 (part 1) Functional Dependencies
Designing Relational Databases
Terminology Product Attribute names Name Price Category Manufacturer
Multivalued Dependencies
Lecture 08: E/R Diagrams and Functional Dependencies
Lecture 6: Functional Dependencies
Lecture 11: Functional Dependencies
Chapter 3: Design theory for relational Databases
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Cse 344 May 18th – Loss and views

Administrivia HW7 Due Wednesday, May 23rd 11:30 OQ6 Due Wednesday, May 23rd 11:00 HW8 Out Wednesday, May 23rd Due Friday, June 1st

Database Design Process company makes product name price address Conceptual Model: Relational Model: Tables + constraints And also functional dep. Normalization: Eliminates anomalies We have covered physical schema earlier. Now let’s move on to the upper layers in the design process Conceptual Schema Physical storage details Physical Schema

Functional Dependencies (FDs) Definition A1, ..., Am  B1, ..., Bn holds in R if: ∀t, t’ ∈ R, (t.A1 = t’.A1 ∧...∧ t.Am = t’.Am  t.B1 = t’.B1∧ ... ∧ t.Bn = t’.Bn ) R A1 ... Am B1 Bn If we can be sure that every instance of R will be one in which a given FD is true, then we say that R satisfies the FD. If we say that R satisfies an FD F, we are stating a constraint on R. t if t, t’ agree here then t, t’ agree here t’

Closure of a set of Attributes Given a set of attributes A1, …, An The closure is the set of attributes B, notated {A1, …, An}+, s.t. A1, …, An  B Example: 1. name  color 2. category  department 3. color, category  price Closure of attribute set A under FD set F is attribute set B such that every relation that satisfies all of F also satisfies A -> B Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Keys A superkey is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An  B A key is a minimal superkey A superkey and for which no subset is a superkey

Eliminating Anomalies Main idea: X  A is OK if X is a (super)key X  A is not OK otherwise Need to decompose the table, but how? Stopped here fall 16 Boyce-Codd Normal Form

Boyce-Codd Normal Form There are no “bad” FDs: Definition. A relation R is in BCNF if: Whenever X B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: " X, either X+ = X or X+ = [all attributes]

Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle 206-555-6543 Joe 987-65-4321 908-555-2121 Westfield 908-555-1234 SSN  Name, City Phone- Number Name, City SSN The only key is: {SSN, PhoneNumber} Hence SSN  Name, City is a “bad” dependency SSN+ In other words: SSN+ = SSN, Name, City and is neither SSN nor All Attributes

Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) S1(A1, ..., An, B1, ..., Bm) S2(A1, ..., An, C1, ..., Cp) S1 = projection of R on A1, ..., An, B1, ..., Bm S2 = projection of R on A1, ..., An, C1, ..., Cp

Lossless Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Price Gizmo 19.99 OneClick 24.99 Name Category Gizmo Gadget OneClick Camera

Lossy Decomposition What is lossy here? Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Category Gizmo Gadget OneClick Camera Price Category 19.99 Gadget 24.99 Camera

Lossy Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Name Category Gizmo Gadget OneClick Camera Price Category 19.99 Gadget 24.99 Camera

Decomposition in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) S1(A1, ..., An, B1, ..., Bm) S2(A1, ..., An, C1, ..., Cp) S1 = projection of R on A1, ..., An, B1, ..., Bm S2 = projection of R on A1, ..., An, C1, ..., Cp Let: The decomposition is called lossless if R = S1 ⋈ S2 Fact: If A1, ..., An  B1, ..., Bm then the decomposition is lossless It follows that every BCNF decomposition is lossless

Is this lossless? If we decompose R into ΠS1(R), ΠS2(R), ΠS3(R), … Is it true that S1 ⋈ S2 ⋈ S3 ⋈ … = R ? That is true if we can show that: R ⊆ S1 ⋈ S2 ⋈ S3 ⋈ … always holds (why?) R ⊇ S1 ⋈ S2 ⋈ S3 ⋈ … neet to check

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R?

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R)

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R) b2 c d2 (a,c) ∈S2 = ΠBD(R)

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R) b2 c d2 (a,c) ∈S2 = ΠBD(R) a3 b (b,c,d) ∈S3 = ΠBCD(R)

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: “Chase” them (apply FDs): A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R) b2 c d2 (a,c) ∈S2 = ΠBD(R) a3 b (b,c,d) ∈S3 = ΠBCD(R) AB A B C D a b1 c1 d c d2 a3 b

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: “Chase” them (apply FDs): A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R) b2 c d2 (a,c) ∈S2 = ΠBD(R) a3 b (b,c,d) ∈S3 = ΠBCD(R) AB BC A B C D a b1 c1 d c d2 a3 b A B C D a b1 c d d2 a3 b

The Chase Test for Lossless Join Example from textbook Ch. 3.4.2 The Chase Test for Lossless Join R(A,B,C,D) = S1(A,D) ⋈ S2(A,C) ⋈ S3(B,C,D) R satisfies: AB, BC, CDA S1 = ΠAD(R), S2 = ΠAC(R), S3 = ΠBCD(R), hence R⊆ S1 ⋈ S2 ⋈ S3 Need to check: R ⊇ S1 ⋈ S2 ⋈ S3 Suppose (a,b,c,d) ∈ S1 ⋈ S2 ⋈ S3 Is it also in R? R must contain the following tuples: “Chase” them (apply FDs): A B C D Why ? a b1 c1 d (a,d) ∈S1 = ΠAD(R) b2 c d2 (a,c) ∈S2 = ΠBD(R) a3 b (b,c,d) ∈S3 = ΠBCD(R) AB BC CDA A B C D a b1 c1 d c d2 a3 b A B C D a b1 c d d2 a3 b A B C D a b1 c d d2 b Hence R contains (a,b,c,d)

Schema Refinements = Normal Forms 1st Normal Form = all tables are flat 2nd Normal Form = no FD with ”non-prime” attributes Obselete Prime attributes: attributes part of a key Boyce Codd Normal Form = no “bad” FDs Are there problems with BCNF?

Dependency Preservation Bookings(title,theatre,city) FD: theatre -> city title,city -> theatre What are the keys?

Dependency Preservation Bookings(title,theatre,city) FD: theatre -> city title,city -> theatre What are the keys? None of the single attributes {title,city},{theatre,title} BCNF?

Dependency Preservation Bookings(title,theatre,city) FD: theatre -> city title,city -> theatre What are the keys? None of the single attributes {title,city},{theatre,title} BCNF? No, {theatre} is neither a trivial dependency nor a superkey Decompose?

Dependency Preservation Bookings(title,theatre,city) FD: theatre -> city title,city -> theatre What are the keys? None of the single attributes {title,city},{theatre,title} BCNF? No, {theatre} is neither a trivial dependency nor a superkey Decompose? R1(theatre,city) R2(theatre,title) What’s wrong? (think of FDs)

Dependency Preservation Bookings(title,theatre,city) FD: theatre -> city title,city -> theatre What are the keys? None of the single attributes {title,city},{theatre,title} BCNF? No, {theatre} is neither a trivial dependency nor a superkey Decompose? R1(theatre,city) R2(theatre,title) What’s wrong? (think of FDs) We can’t guarantee title,city -> theatre with simple constraints if we join

Normal forms 3rd Normal form Allows tables with BCNF violations if a decomposition separates an FD Can result in redundancy 4th Normal form Multi-valued dependencies Incorporate info about attributes in neither A nor B All MVDs are also FDs Apply BCNF alg with for MVD and FD

Normal forms 5th Normal Form Join dependency 6th Normal Form Lossless/exact joining Join independent Tables 6th Normal Form Only allow trivial join dependencies Only need key/tuple constraints to represent all constraints

Forms/decomposition Produce and verify FDs, superkeys, keys Be able to decompose a table into BCNF Flaws of 1NF/BCNF Identify loss and be able to apply the chase test

Implementation We learned about how to normalize tables to avoid anomalies How can we implement normalization in SQL if we can’t modify existing tables? This might be due to legacy applications that rely on previous schemas to run

Views A view in SQL = A table computed from other tables, s.t., whenever the base tables are updated, the view is updated too More generally: A view is derived data that keeps track of changes in the original data Compare: A function computes a value from other values, but does not keep track of changes to the inputs

This is like a new table StorePrice(store,price) Purchase(customer, product, store) Product(pname, price) A Simple View StorePrice(store, price) Create a view that returns for each store the prices of products purchased at that store CREATE VIEW StorePrice AS SELECT DISTINCT x.store, y.price FROM Purchase x, Product y WHERE x.product = y.pname This is like a new table StorePrice(store,price)

We Use a View Like Any Table Purchase(customer, product, store) Product(pname, price) We Use a View Like Any Table StorePrice(store, price) A "high end" store is a store that sell some products over 1000. For each customer, return all the high end stores that they visit. SELECT DISTINCT u.customer, u.store FROM Purchase u, StorePrice v WHERE u.store = v.store AND v.price > 1000

Types of Views Virtual views Materialized views Computed only on-demand – slow at runtime Always up to date Materialized views Pre-computed offline – fast at runtime May have stale data (must recompute or update) Indexes are materialized views A key component of physical tuning of databases is the selection of materialized views and indexes Say: The key components of physical tunings are the selection of materialized views and indexes.

Materialized Views CREATE MATERIALIZED VIEW View_name BUILD [IMMEDIATE/DEFERRED] REFRESH [FAST/COMPLETE/FORCE] ON [COMMIT/DEMAND] AS Sql_query Immediate v deferred Build immediately, or after a query Fast v. Complete v. Force Level of refresh – log based v. complete rebuild Commit v. Demand Commit: after data is added Demand: after conditions are set (time is common)

Vertical Partitioning Resumes SSN Name Address Resume Picture 234234 Mary Huston Clob1… Blob1… 345345 Sue Seattle Clob2… Blob2… 345343 Joan Clob3… Blob3… 432432 Ann Portland Clob4… Blob4… T1 T2 T3 SSN Name Address 234234 Mary Huston 345345 Sue Seattle . . . SSN Resume 234234 Clob1… 345345 Clob2… SSN Picture 234234 Blob1… 345345 Blob2… T2.SSN is a key and a foreign key to T1.SSN. Same for T3.SSN

Vertical Partitioning T1(ssn,name,address) T2(ssn,resume) T3(ssn,picture) Resumes(ssn,name,address,resume,picture) Vertical Partitioning CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn AND T1.ssn=T3.ssn

Vertical Partitioning T1(ssn,name,address) T2(ssn,resume) T3(ssn,picture) Vertical Partitioning Resumes(ssn,name,address,resume,picture) CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn AND T1.ssn=T3.ssn SELECT address FROM Resumes WHERE name = ‘Sue’ How are queries over views processed?

Vertical Partitioning T1(ssn,name,address) T2(ssn,resume) T3(ssn,picture) Vertical Partitioning Resumes(ssn,name,address,resume,picture) CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn AND T1.ssn=T3.ssn SELECT address FROM Resumes WHERE name = ‘Sue’ Original query: SELECT T1.address FROM T1, T2, T3 WHERE T1.name = ‘Sue’ AND T1.SSN=T2.SSN AND T1.SSN = T3.SSN

Vertical Partitioning T1(ssn,name,address) T2(ssn,resume) T3(ssn,picture) Vertical Partitioning Resumes(ssn,name,address,resume,picture) CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn AND T1.ssn=T3.ssn SELECT address FROM Resumes WHERE name = ‘Sue’ Modified query: SELECT T1.address FROM T1, T2, T3 WHERE T1.name = ‘Sue’ AND T1.SSN=T2.SSN AND T1.SSN = T3.SSN Final query: SELECT T1.address FROM T1 WHERE T1.name = ‘Sue’

Vertical Partitioning Applications Advantages Speeds up queries that touch only a small fraction of columns Single column can be compressed effectively, reducing disk I/O Disadvantages Updates are expensive! Need many joins to access many columns Repeated key columns add overhead

Horizontal Partitioning Customers CustomersInHouston SSN Name City 234234 Mary Houston SSN Name City 234234 Mary Houston 345345 Sue Seattle 345343 Joan Ann Portland -- Frank Calgary Jean Montreal CustomersInSeattle SSN Name City 345345 Sue Seattle 345343 Joan . . . . .

Horizontal Partitioning CustomersInHouston(ssn,name,city) CustomersInSeattle(ssn,name,city) . . . . . Horizontal Partitioning Customers(ssn,name,city) CREATE VIEW Customers AS CustomersInHouston UNION ALL CustomersInSeattle . . .

Horizontal Partitioning CustomersInHouston(ssn,name,city) CustomersInSeattle(ssn,name,city) . . . . . Horizontal Partitioning Customers(ssn,name,city) SELECT name FROM Customers WHERE city = ‘Seattle’ Which tables are inspected by the system ? Answer: ALL ! Because the system doesn’t know where the customers in seattle are.

Horizontal Partitioning CustomersInHouston(ssn,name,city) CustomersInSeattle(ssn,name,city) . . . . . Customers(ssn,name,city) Horizontal Partitioning Better: remove CustomerInHouston.city etc CREATE VIEW Customers AS (SELECT SSN, name, ‘Houston’ as city FROM CustomersInHouston) UNION ALL (SELECT SSN, name, ‘Seattle’ as city FROM CustomersInSeattle) . . .

Horizontal Partitioning CustomersInHouston(ssn,name,city) CustomersInSeattle(ssn,name,city) . . . . . Horizontal Partitioning Customers(ssn,name,city) SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM CustomersInSeattle

Horizontal Partitioning Applications Performance optimization Especially for data warehousing E.g., one partition per month E.g., archived applications and active applications Distributed and parallel databases Data integration

Conclusion Poor schemas can lead to performance inefficiencies E/R diagrams are means to structurally visualize and design relational schemas Normalization is a principled way of converting schemas into a form that avoid such problems BCNF is one of the most widely used normalized form in practice