Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 03: Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall 20111.

Similar presentations


Presentation on theme: "Lecture 03: Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall 20111."— Presentation transcript:

1 Lecture 03: Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall 20111

2 2 Announcements HW1: was due Monday HW2: due next Monday Dan Suciu -- CSEP544 Fall 2011

3 Outline and Reading Material Schema Normalization: 19.1 – 19.7 –Read only about BCNF, skip 3NF –Note: you need BCNF for HW2 Constraints and triggers: 3.2, 3.3, 5.8 Views: 3.6 –Answering queries using views: Sec. 1,2,3 Dan Suciu -- CSEP544 Fall 20113 Some material is NOT covered in the book. Read the slides carefully: we will not cover all in class!

4 Schema Normalization Dan Suciu -- CSEP544 Fall 20114

5 5 Normal Forms 1st Normal Form = all tables are flat 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd Normal Form = see book 4 th Normal Form = golden standard Dan Suciu -- CSEP544 Fall 2011

6 6 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat NameGPA Course s Alice3.8 Bob3.7 Carol3.9 Math DB OS DB OS Math OS Student NameGPA Alice3.8 Bob3.7 Carol3.9 Student Course Math DB OS StudentCourse AliceMath CarolMath AliceDB BobDB AliceOS CarolOS Takes Course May need to add keys

7 7 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Dan Suciu -- CSEP544 Fall 2011

8 8 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Dan Suciu -- CSEP544 Fall 2011

9 9 Relational Schema Design Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Recall set attributes (persons with several phones): NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield One person may have multiple phones, but lives in only one city

10 10 Relation Decomposition Break the relation into two: NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield

11 11 Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its functional dependencies Use them to design a better relational schema Dan Suciu -- CSEP544 Fall 2011

12 12 Functional Dependencies What is a functional dependency? It is a kind of constraint In theory one should find the FDs during requirement analysis, then apply rigorously the normalization steps that we discuss next In practice one rarely follows this strict prescription Dan Suciu -- CSEP544 Fall 2011

13 13 Functional Dependencies Meaning: If two tuples agree on the attributes then they must also agree on the attributes Notation: A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m Dan Suciu -- CSEP544 Fall 2011

14 14 When Does an FD Hold Definition: A 1,..., A m  B 1,..., B n holds in R if:  t, t’  R, (t.A 1 =t’.A 1 ...  t.A m =t’.A m  t.B 1 =t’.B 1 ...  t.B n =t’.B n ) A1A1...AmAm B1B1 nmnm if t, t’ agree here then t, t’ agree here t t’ R

15 15 Examples EmpID  Name, Phone, Position Position  Phone but not Phone  Position An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Dan Suciu -- CSEP544 Fall 2011

16 16 Example Position  Phone EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike 9876  Salesrep E1111Smith 9876  Salesrep E9999Mary1234Lawyer Dan Suciu -- CSEP544 Fall 2011

17 17 Example EmpIDNamePhonePosition E0045Smith 1234  Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary 1234  Lawyer but not Phone  Position Dan Suciu -- CSEP544 Fall 2011

18 18 Example FD’s are constraints: On some instances they hold On others they don’t namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweaksGadgetGreenToys99 Does this instance satisfy all the FDs ? name  color category  department color, category  price name  color category  department color, category  price Dan Suciu -- CSEP544 Fall 2011

19 19 Example namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweaksGadgetBlackToys99 GizmoStationaryGreen Office- supp. 59 What about this one ? name  color category  department color, category  price name  color category  department color, category  price

20 20 An Interesting Observation If all these FDs are true: name  color category  department color, category  price name  color category  department color, category  price Then this FD also holds: name, category  price Why ??

21 21 Goal: Find ALL Functional Dependencies Anomalies occur when certain “bad” FDs hold We know some of the FDs Need to find all FDs, then look for the bad ones Dan Suciu -- CSEP544 Fall 2011

22 22 Armstrong’s Rules (1/3) Is equivalent to Splitting rule and Combing rule A1...AmB1...Bm A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m Dan Suciu -- CSEP544 Fall 2011

23 23 Armstrong’s Rules (2/3) Trivial Rule Why ? A1A1 …AmAm where i = 1, 2,..., n A 1, A 2, …, A n  A i Dan Suciu -- CSEP544 Fall 2011

24 24 Armstrong’s Rules (3/3) Transitive Closure Rule If and then Why ? A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p Dan Suciu -- CSEP544 Fall 2011

25 25 A1A1 …AmAm B1B1 …BmBm C1C1...CpCp Dan Suciu -- CSEP544 Fall 2011

26 26 Example (continued) Start with these: Infer these: Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price

27 27 Example (continued) Answers: Inferred FD Which Rule did we apply ? 4. name, category  nameTrivial rule 5. name, category  colorTransitivity on 4, 1 6. name, category  categoryTrivial rule 7. name, category  color, categorySplit/combine on 5, 6 8. name, category  priceTransitivity on 3, 7 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price THIS IS TOO HARD ! There is an easier way.

28 28 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } + = the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

29 29 Closure Algorithm X={A1, …, An}. repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. X={A1, …, An}. repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = ? [ …in class…] name  color category  department color, category  price name  color category  department color, category  price Example:

30 30 Practice at Home… Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B Dan Suciu -- CSEP544 Fall 2011

31 31 Why Do We Need Closure With closure we can find all FD’s easily To check if X  A –Compute X + –Check if A  X + Dan Suciu -- CSEP544 Fall 2011

32 Practice at Home A, B  C A, D  B B  D A, B  C A, D  B B  D Find all FD’s implied by:

33 Practice at Home A, B  C A, D  B B  D A, B  C A, D  B B  D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Find all FD’s implied by:

34 Practice at Home A, B  C A, D  B B  D A, B  C A, D  B B  D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B Find all FD’s implied by:

35 Keys A superkey is a set of attributes X = A 1,..., A n s.t. for any other attribute B, we have X  B Note: X is a superkey if X + = [all attributes] A key is a minimal superkey X How do we compute all keys X ?

36 36 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color What are the keys ? Dan Suciu -- CSEP544 Fall 2011

37 37 Example Product(name, price, category, color) name, category  price category  color name, category  price category  color (name, category) + = name, category, price, color Hence X = (name, category) is a key What are the keys ?

38 Practice at Home Dan Suciu -- CSEP544 Fall 201138 student  address room, time  course student, course  room, time student  address room, time  course student, course  room, time Find all keys Enrollment(student, address, course, room, time)

39 39 Read at Home We can have more than one key. Example: relation R(A,B,C) satisfies AB  C BC  A A  BC B  AC or Find all keys

40 40 Boyce-Codd Normal Form There are no “bad” FDs: A relation R is in BCNF if: Whenever X  B is a non-trivial dependency, then X is a superkey. A relation R is in BCNF if: Whenever X  B is a non-trivial dependency, then X is a superkey. Equivalently: Dan Suciu -- CSEP544 Fall 2011 A relation R is in BCNF if:  X, either X + = X or X + = [all attributes] A relation R is in BCNF if:  X, either X + = X or X + = [all attributes]

41 41 Example The only key is: {SSN, PhoneNumber} Hence SSN  Name, City is a “bad” dependency SSN  Name, City Alternatively: SSN+ = Name, City and is neither SSN nor all attributes NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield

42 42 BCNF Decomposition Algorithm Normalize(R) find X s.t.: X ≠ X + ≠ [all attributes] if (not found) then “R is in BCNF” let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X  Y) and R2(X  Z) Normalize(R1); Normalize(R2); YXZ X+X+

43 43 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Find X s.t.: X ≠X + ≠ [all attributes] Dan Suciu -- CSEP544 Fall 2011

44 44 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Find X s.t.: X ≠X + ≠ [all attributes] Dan Suciu -- CSEP544 Fall 2011 Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) SSN name, age, hairColor phoneNumber

45 45 Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Find X s.t.: X ≠X + ≠ [all attributes] Dan Suciu -- CSEP544 Fall 2011 Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P: age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) Iteration 2: P: age+ = age, hairColor Decompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber) What are the keys ?

46 46 Practice at Home A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) Dan Suciu -- CSEP544 Fall 2011

47 47 Practice at Home What are the keys ? A  B B  C R(A,B,C,D) A + = ABC ≠ ABCD R(A,B,C,D) What happens if in R we first pick B + ? Or AB + ? R 1 (A,B,C) B + = BC ≠ ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) Dan Suciu -- CSEP544 Fall 2011

48 48 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Dan Suciu -- CSEP544 Fall 2011

49 49 Theory of Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

50 50 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

51 51 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) BCNF decomposition is always lossless. WHY ? Note: don’t need A 1,..., A n  C 1,..., C p

52 Constraints Dan Suciu -- CSEP544 Fall 201152

53 Integrity Constraints in SQL Constraint = a property we want to hold System enforces constraints at updates Constraints in SQL: –Keys, foreign keys –Attribute-level constraints –Tuple-level constraints –Global constraints: assertions Dan Suciu -- CSEP544 Fall 201153

54 54 Keys OR: CREATE TABLE Product ( name CHAR(30) PRIMARY KEY, price INT) CREATE TABLE Product ( name CHAR(30) PRIMARY KEY, price INT) CREATE TABLE Product ( name CHAR(30), price INT, PRIMARY KEY (name)) CREATE TABLE Product ( name CHAR(30), price INT, PRIMARY KEY (name)) Product(name, price) Dan Suciu -- CSEP544 Fall 2011

55 55 Keys with Multiple Attributes CREATE TABLE Product ( name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (name, category)) CREATE TABLE Product ( name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (name, category)) NameCategoryPrice GizmoGadget10 CameraPhoto20 GizmoPhoto30 GizmoGadget40 Product(name, category, price)

56 56 Other Keys CREATE TABLE Product ( productID CHAR(10), name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (productID), UNIQUE (name, category)) CREATE TABLE Product ( productID CHAR(10), name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (productID), UNIQUE (name, category)) There is at most one PRIMARY KEY; there can be many UNIQUE Dan Suciu -- CSEP544 Fall 2011

57 57 Foreign Key Constraints CREATE TABLE Purchase ( buyer CHAR(30), seller CHAR(30), product CHAR(30) REFERENCES Product(name), store VARCHAR(30)) CREATE TABLE Purchase ( buyer CHAR(30), seller CHAR(30), product CHAR(30) REFERENCES Product(name), store VARCHAR(30)) Foreign key Purchase(buyer, seller, product, store) Product(name, price) Dan Suciu -- CSEP544 Fall 2011

58 58 NameCategory Gizmogadget CameraPhoto OneClickPhoto ProdNameStore GizmoWiz CameraRitz CameraWiz ProductPurchase Dan Suciu -- CSEP544 Fall 2011

59 Foreign Key Constraints 59 Purchase(buyer, seller, product, category, store) Product(name, category, price) CREATE TABLE Purchase( buyer VARCHAR(50), seller VARCHAR(50), product CHAR(20), category VARCHAR(20), store VARCHAR(30), FOREIGN KEY (product, category) REFERENCES Product(name, category) ); CREATE TABLE Purchase( buyer VARCHAR(50), seller VARCHAR(50), product CHAR(20), category VARCHAR(20), store VARCHAR(30), FOREIGN KEY (product, category) REFERENCES Product(name, category) ); Dan Suciu -- CSEP544 Fall 2011

60 60 NameCategory Gizmogadget CameraPhoto OneClickPhoto ProdNameStore GizmoWiz CameraRitz CameraWiz ProductPurchase What happens during updates ? Types of updates: In Purchase: insert/update In Product: delete/update

61 61 What happens during updates ? SQL has three policies for maintaining referential integrity: Reject violating modifications (default) Cascade: after a delete/update do a delete/update Set-null set foreign-key field to NULL Dan Suciu -- CSEP544 Fall 2011

62 Constraints on Attributes and Tuples 62 CREATE TABLE Purchase (... store VARCHAR(30) NOT NULL,... ) CREATE TABLE Purchase (... store VARCHAR(30) NOT NULL,... ) CREATE TABLE Product (... price INT CHECK (price >0 and price < 999)) CREATE TABLE Product (... price INT CHECK (price >0 and price < 999)) Attribute level constraints: Tuple level constraints: Dan Suciu -- CSEP544 Fall 2011... CHECK (price * quantity < 10000)...

63 63 CREATE TABLE Purchase ( prodName CHAR(30) CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL) CREATE TABLE Purchase ( prodName CHAR(30) CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL) What is the difference from Foreign-Key ? Dan Suciu -- CSEP544 Fall 2011

64 64 General Assertions CREATE ASSERTION myAssert CHECK NOT EXISTS( SELECT Product.name FROM Product, Purchase WHERE Product.name = Purchase.prodName GROUP BY Product.name HAVING count(*) > 200) CREATE ASSERTION myAssert CHECK NOT EXISTS( SELECT Product.name FROM Product, Purchase WHERE Product.name = Purchase.prodName GROUP BY Product.name HAVING count(*) > 200) Dan Suciu -- CSEP544 Fall 2011

65 65 Comments on Constraints Can give them names, and alter later We need to understand exactly when they are checked We need to understand exactly what actions are taken if they fail Dan Suciu -- CSEP544 Fall 2011

66 Semantic Optimization using Constraints 66 SELECT Purchase.store FROM Product, Purchase WHERE Product.name=Purchase.product Purchase(buyer, seller, product, store) Product(name, price) SELECT Purchase.store FROM Purchase Why ? and When ?

67 Views Dan Suciu -- CSEP544 Fall 201167

68 Overview Views are ubiquitous in data management: Used in SQL as names for predefined queries More generally, any derived data is a view Dan Suciu -- CSEP544 Fall 201168

69 69 CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname View Basics CustomerPrice(customer, price) “virtual table” Dan Suciu -- CSEP544 Fall 2011 Views are relations, defined by a query Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

70 View Basics Dan Suciu -- CSEP544 Fall 201170 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 We can later use the view: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

71 71 View Basics SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname View: Query: Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price) Dan Suciu -- CSEP544 Fall 2011

72 72 View Basics SELECT DISTINCT u.customer, v.store FROM (SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname) u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM (SELECT DISTINCT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname) u, Purchase v WHERE u.customer = v.customer AND u.price > 100 Modified query: Dan Suciu -- CSEP544 Fall 2011 Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price) Next, unnest the query…

73 73 View Basics SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname Modified and unnested query: Dan Suciu -- CSEP544 Fall 2011 Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

74 74 Practice at Home… SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 ?? Dan Suciu -- CSEP544 Fall 2011 Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price)

75 75 Answer SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 SELECT DISTINCT u.customer, v.store FROM CustomerPrice u, Purchase v WHERE u.customer = v.customer AND u.price > 100 Dan Suciu -- CSEP544 Fall 2011 Purchase(customer, product, store) Product(pname, price) CustomerPrice(customer, price) SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname SELECT DISTINCT x.customer, v.store FROM Purchase x, Product y, Purchase v, WHERE x.customer = v.customer AND y.price > 100 AND x.product = y.pname

76 76 Types of Views Virtual views: –Pros/cons ? Materialized views –Pros/cons ? Dan Suciu -- CSEP544 Fall 2011

77 77 Types of Views Virtual views: –Used in databases –Computed only on-demand – slow at runtime –Always up to date Materialized views –Used in databases and data warehouses –Pre-computed offline – fast at runtime –May have stale data or expensive synchronization Dan Suciu -- CSEP544 Fall 2011

78 Technical Aspects View inlining, or query modification Query answering using views Updating views Incremental view update DbView Answer V Q DbView Answer V Q DbView V Update ?? DbView V Update??

79 79 Applications Views have lots of applications Physical and logical data independence Security Indexes Denormalization Semantic caching … Dan Suciu -- CSEP544 Fall 2011

80 Physical Data Independence: Vertical Partitioning SSNNameAddressResumePicture 234234MaryHustonClob1…Blob1… 345345SueSeattleClob2…Blob2… 345343JoanSeattleClob3…Blob3… 234234AnnPortlandClob4…Blob4… Resumes SSNNameAddress 234234MaryHuston 345345SueSeattle... SSNResume 234234Clob1… 345345Clob2… SSNPicture 234234Blob1… 345345Blob2… T1 T2T3

81 81 Vertical Partitioning CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn and T2.ssn=T3.ssn CREATE VIEW Resumes AS SELECT T1.ssn, T1.name, T1.address, T2.resume, T3.picture FROM T1,T2,T3 WHERE T1.ssn=T2.ssn and T2.ssn=T3.ssn Dan Suciu -- CSEP544 Fall 2011

82 82 Vertical Partitioning SELECT address FROM Resumes WHERE name = ‘Sue’ SELECT address FROM Resumes WHERE name = ‘Sue’ We want the system to query only table T1. Will that happen ? Dan Suciu -- CSEP544 Fall 2011

83 Vertical Partitioning Hot trend in databases today for analytics Main idea: –Storage = Column(TID, value) pairs –Sort by TID  enables reconstructing the table –Compress  great compression, minimize I/O –Updates = VERY, VERY expensive Companies: C-Store and Vertica Dan Suciu -- CSEP544 Fall 201183

84 Horizontal Partitioning SSNNameCity 234234MaryHuston 345345SueSeattle 345343JoanSeattle 234234AnnPortland --FrankCalgary --JeanMontreal Customers SSNNameCity 234234MaryHuston CustomersInHuston SSNNameCity 345345SueSeattle 345343JoanSeattle CustomersInSeattle.....

85 85 Horizontal Partitioning CREATE VIEW Customers AS CustomersInHuston UNION ALL CustomersInSeattle UNION ALL... CREATE VIEW Customers AS CustomersInHuston UNION ALL CustomersInSeattle UNION ALL... Dan Suciu -- CSEP544 Fall 2011

86 86 Horizontal Partitioning SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM Customers WHERE city = ‘Seattle’ Which tables are queried by the system ? WHY ??? Dan Suciu -- CSEP544 Fall 2011

87 87 Horizontal Partitioning SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM Customers WHERE city = ‘Seattle’ Now even humans can’t tell which table contains customers in Seattle Dan Suciu -- CSEP544 Fall 2011 CREATE VIEW Customers AS CustomersInXXX UNION ALL CustomersInYYY UNION ALL... CREATE VIEW Customers AS CustomersInXXX UNION ALL CustomersInYYY UNION ALL...

88 88 Horizontal Partitioning CREATE VIEW Customers AS (SELECT SSN, name, ‘Huston’ as city FROM CustomersInHuston) UNION ALL (SELECT SSN, name, ‘Seattle’ as city FROM CustomersInSeattle) UNION ALL... CREATE VIEW Customers AS (SELECT SSN, name, ‘Huston’ as city FROM CustomersInHuston) UNION ALL (SELECT SSN, name, ‘Seattle’ as city FROM CustomersInSeattle) UNION ALL... A hack around the problem: Dan Suciu -- CSEP544 Fall 2011

89 89 Horizontal Partitioning SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM CustomersInSeattle SELECT name FROM CustomersInSeattle Dan Suciu -- CSEP544 Fall 2011

90 Data Integration Terminology Local DB 1 Local DB k … Integrated Data Local DB 1 Local DB k … Integrated Data Global as View V V1V1 VkVk Local as View Which one needs query expansion, which one needs query answering using views ?

91 Horizontal Partitioning as LAV CREATE VIEW CustomersInSeattle AS (SELECT * FROM Customers WHERE city = ‘Seattle’) CREATE VIEW CustomersInHuston AS (SELECT * FROM Customers WHERE city = ‘Huston’) …. CREATE VIEW CustomersInSeattle AS (SELECT * FROM Customers WHERE city = ‘Seattle’) CREATE VIEW CustomersInHuston AS (SELECT * FROM Customers WHERE city = ‘Huston’) …. SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM Customers WHERE city = ‘Seattle’ SELECT name FROM CustomersInSeattle

92 Views and Security NameAddressBalance MaryHuston450.99 SueSeattle-240 JoanSeattle333.25 AnnPortland-520 Customers: Fred is not allowed to see Balance CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers NameAddressBalance MaryHuston450.99 SueSeattle-240 JoanSeattle333.25 AnnPortland-520 John is not allowed to see >0 balances CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0 CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0

93 93 Indexes REALLY important to speed up query processing time. SELECT * FROM Person WHERE name = 'Smith' SELECT * FROM Person WHERE name = 'Smith' CREATE INDEX myindex05 ON Person(name) Person (name, age, city) May take too long to scan the entire Person table Now, when we rerun the query it will be much faster

94 B+ Tree Index 94 AdamBettyCharles….Smith…. We will discuss them in detail in a later lecture. Dan Suciu -- CSEP544 Fall 2011

95 95 Creating Indexes Indexes can be created on more than one attribute: CREATE INDEX doubleindex ON Person (age, city) Example:

96 96 Creating Indexes Indexes can be created on more than one attribute: SELECT * FROM Person WHERE age = 55 AND city = 'Seattle' Helps in: CREATE INDEX doubleindex ON Person (age, city) Example:

97 97 Creating Indexes Indexes can be created on more than one attribute: SELECT * FROM Person WHERE age = 55 AND city = 'Seattle' Helps in: CREATE INDEX doubleindex ON Person (age, city) Example: SELECT * FROM Person WHERE age = 55 and even in:

98 98 Creating Indexes Indexes can be created on more than one attribute: SELECT * FROM Person WHERE age = 55 AND city = 'Seattle' Helps in: SELECT * FROM Person WHERE city = 'Seattle' But not in: CREATE INDEX doubleindex ON Person (age, city) Example: SELECT * FROM Person WHERE age = 55 and even in:

99 CREATE INDEX W ON Product(weight) CREATE INDEX P ON Product(price) CREATE INDEX W ON Product(weight) CREATE INDEX P ON Product(price) Indexes are Materialized Views SELECT weight, price FROM Product WHERE weight > 10 and price < 100 SELECT weight, price FROM Product WHERE weight > 10 and price < 100 Product(pid, name, weight, price, …) SELECT x.weight, y.price FROM W x, P y WHERE x.weight > 10 and y.price < 100 and x.pid = y.pid SELECT x.weight, y.price FROM W x, P y WHERE x.weight > 10 and y.price < 100 and x.pid = y.pid CREATE VIEW W AS SELECT weight, pid FROM Product y CREATE VIEW P AS SELECT price, pid FROM Product y CREATE VIEW W AS SELECT weight, pid FROM Product y CREATE VIEW P AS SELECT price, pid FROM Product y Indexes as LAV: “Covering indexes”: query uses only the indexes

100 Denormalization Compute a view that is the join of several tables The view is now a relation that is not in BCNF (why not?) Dan Suciu -- CSEP544 Fall 2011100 CREATE VIEW CustomerPurchase AS SELECT x.customer, x.store, y.pname, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPurchase AS SELECT x.customer, x.store, y.pname, y.price FROM Purchase x, Product y WHERE x.product = y.pname Purchase(customer, product, store) Product(pname, price)

101 101 Semantic Caching Queries Q1, Q2, … have been executed, and their results are stored at the client Now we need to compute a new query Q Sometimes we can use the prior results in answering Q These queries can be seen as materialized views The problem becomes: answer Q using the views Q1,Q2,…

102 Technical Aspects of Views Simplifying queries after the views have been in-lined –Query un-nesting –Query minimization Handling updates –Updating virtual views –Incremental update of materialized views 102Dan Suciu -- CSEP544 Fall 2011

103 103 Set v.s. Bag Semantics SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... Set semantics Bag semantics Dan Suciu -- CSEP544 Fall 2011

104 104 Unnesting: Sets/Sets SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... Dan Suciu -- CSEP544 Fall 2011

105 105 Unnesting: Sets/Bags SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... SELECT DISTINCT a,b,c FROM R, S, T WHERE... Dan Suciu -- CSEP544 Fall 2011

106 106 Unnesting: Bags/Bags SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM R, S, T WHERE... SELECT a,b,c FROM R, S, T WHERE... Dan Suciu -- CSEP544 Fall 2011

107 107 Unnesting: Bags/Sets SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... SELECT a,b,c FROM (SELECT DISTINCT u,v FROM R,S WHERE …), T WHERE... NO Dan Suciu -- CSEP544 Fall 2011

108 Query Minimization Replace a query Q with Q’ having fewer tables in the FROM clause When Q has fewest number of tables in the FROM clause, then we say it is minimized Usually (but not always) users write queries that are already minimized But the result of rewriting a query over view is often not minimized Dan Suciu -- CSEP544 Fall 2011108

109 Query Minimization under Bag Semantics Rule 1: If: x, y are tuple variables over the same table and: The condition x.key = y.key is in the WHERE clause Then combine x, y into a single variable query Dan Suciu -- CSEP544 Fall 2011109

110 Query Minimization under Bag Semantics SELECT x.date, y.name FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.price 150 and y.pid = z.pid and x.cid = z.cid SELECT x.date, y.name FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.price 150 and y.pid = z.pid and x.cid = z.cid Order(cid, pid, weight, date) Product(pid, name, price) SELECT x.date, y.name FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT x.date, y.name FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 What constraints do we need to have for this optimization ?

111 111 Query Minimization under Bag Semantics Rule 2: If x ranges over S, y ranges over T, and The condition x.fk = y.key is in the WHERE clause, and there is a not null constraint on x.fk y is not used anywhere else, and Then remove T (and y) from the query Dan Suciu -- CSEP544 Fall 2011

112 Query Minimization under Bag Semantics Order(cid, pid, weight, date) Product(pid, name, price) SELECT x.cid, x.date FROM Order x WHERE x.weight > 20 SELECT x.cid, x.date FROM Order x WHERE x.weight > 20 SELECT x.cid, x.date FROM Order x, Product y WHERE x.pid = y.pid and x.weight > 20 SELECT x.cid, x.date FROM Order x, Product y WHERE x.pid = y.pid and x.weight > 20 Q: Where do we encounter non- minimized queries ? What constraints do we need to have for this optimization ?

113 Query Minimization under Bag Semantics CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99 CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150 CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99 CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150 Order(cid, pid, weight, date) Product(pid, name, price) SELECT u.cid FROM CheapOrders u, HeavyOrders v WHERE u.pid = v.pid and u.cid = v.cid SELECT u.cid FROM CheapOrders u, HeavyOrders v WHERE u.pid = v.pid and u.cid = v.cid A: in queries resulting from view inlining Customers who ordered cheap, heavy products

114 Query Minimization CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99 CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150 CREATE VIEW CheapOrders AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid and y.price < 99 CREATE VIEW HeavyOrders AS SELECT a.cid,a.pid,a.date,b.name,b.price FROM Order a, Product b WHERE a.pid = b.pid and a.weight > 150 Order(cid, pid, weight, date) Product(pid, name, price) SELECT u.cid FROM CheapOrders u, HeavyOrders v WHERE u.pid = v.pid and u.cid = v.cid SELECT u.cid FROM CheapOrders u, HeavyOrders v WHERE u.pid = v.pid and u.cid = v.cid SELECT a.cid FROM Order x, Product y Order a, Product b WHERE.... SELECT a.cid FROM Order x, Product y Order a, Product b WHERE.... Redundant Orders and Products

115 SELECT a.cid FROM Order x, Product y, Order a, Product b WHERE x.pid = y.pid and a.pid = b.pid and y.price 150 and x.cid = a.cid and x.pid = a.pid SELECT a.cid FROM Order x, Product y, Order a, Product b WHERE x.pid = y.pid and a.pid = b.pid and y.price 150 and x.cid = a.cid and x.pid = a.pid SELECT x.cid FROM Order x, Product y, Product b WHERE x.pid = y.pid and x.pid = b.pid and y.price 150 SELECT x.cid FROM Order x, Product y, Product b WHERE x.pid = y.pid and x.pid = b.pid and y.price 150 x = a SELECT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 y = b

116 Query Minimization under Set Semantics Rules 1 and 2 still apply Rule 3 involves homomorphisms 116Dan Suciu -- CSEP544 Fall 2011

117 Definition of a Homomorphism A homomorphism from Q’ to Q is a mapping h : {y1, …, ym}  {x1, …, xk} such that: (a)If h(yi) = xj, then Ri’ = Rj (b)C logically implies h(C’) and (c)h(A’) = A A homomorphism from Q’ to Q is a mapping h : {y1, …, ym}  {x1, …, xk} such that: (a)If h(yi) = xj, then Ri’ = Rj (b)C logically implies h(C’) and (c)h(A’) = A SELECT DISTINCT A FROM R1 x1, …, Rk xk WHERE C SELECT DISTINCT A FROM R1 x1, …, Rk xk WHERE C SELECT DISTINCT A’ FROM R1’ y1, …, Rm’ ym WHERE C’ SELECT DISTINCT A’ FROM R1’ y1, …, Rm’ ym WHERE C’ Q Q’

118 Definition of a Homomorphism Theorem If there exists a homomorphism from Q’ to Q, then every answer returned by Q is also returned by Q’. We say that Q is contained in Q’ If there exists a homomorphism from Q’ to Q, and a homomorphism from Q to Q’, then Q and Q’ are equivalent

119 Find Homomorphism Dan Suciu -- CSEP544 Fall 2011119 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 Q SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 Q’ Order(cid, pid, weight, date) Product(pid, name, price)

120 Homomorphism Q  Q’ Dan Suciu -- CSEP544 Fall 2011120 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 Q SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 Q’ Order(cid, pid, weight, date) Product(pid, name, price) Every answer to Q is also an answer to Q’ WHY ?

121 Homomorphism Q  Q’ Dan Suciu -- CSEP544 Fall 2011121 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 SELECT DISTINCT x.cid FROM Order x, Product y WHERE x.pid = y.pid and y.price 150 Q SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 SELECT DISTINCT x.cid FROM Order x, Product y, Order z WHERE x.pid = y.pid and y.pid = z.pid and y.price 150 and z.weight > 100 Q’ Order(cid, pid, weight, date) Product(pid, name, price) Q and Q’ are equivalent !

122 Query Minimization under Set Semantics SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x.pid FROM Product x, Product z WHERE x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x.pid FROM Product x, Product z WHERE x.category = z.category and z.price > 500 and z.weight > 10 Same as:

123 123 Query Minimization under Set Semantics Rule 3: Let Q’ be the query obtained by removing the tuple variable x from Q. If: Q has set semantics (and same for Q’) there exists a homomorphism from Q to Q’ Then Q’ is equivalent to Q. Hence one can safely remove x. Dan Suciu -- CSEP544 Fall 2011

124 Example SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x’.pid FROM Product x’, Product z’ WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10 SELECT DISTINCT x’.pid FROM Product x’, Product z’ WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10 Q Q’ Find a homomorphism h: Q  Q’

125 Example SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x.pid FROM Product x, Product y, Product z WHERE x.category = y.category and y.price > 100 and x.category = z.category and z.price > 500 and z.weight > 10 SELECT DISTINCT x’.pid FROM Product x’, Product z’ WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10 SELECT DISTINCT x’.pid FROM Product x’, Product z’ WHERE x’.category = z’.category and z’.price > 500 and z’.weight > 10 Q Q’ Answer: H(x) = x’, H(y) = H(z) = z’

126 126 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 Updating Views INSERT INTO Expensive-Product VALUES(‘Gizmo’) INSERT INTO Expensive-Product VALUES(‘Gizmo’) Purchase(customer, product, store) Product(pname, price) Updateable view Dan Suciu -- CSEP544 Fall 2011

127 Updatable Views Have a virtual view V(A1, A2, …) over tables R1, R2, … User wants to update a tuple in V –Insert/modify/delete Can we translate this into updates to R1, R2, … ? If yes: V = “an updateable view” If not: V = “a non-updateable view” Dan Suciu -- CSEP544 Fall 2011127

128 128 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 CREATE VIEW Expensive-Product AS SELECT pname FROM Product WHERE price > 100 Updating Views INSERT INTO Product VALUES(‘Gizmo’, NULL) INSERT INTO Product VALUES(‘Gizmo’, NULL) Purchase(customer, product, store) Product(pname, price) Updateable view Dan Suciu -- CSEP544 Fall 2011 INSERT INTO Expensive-Product VALUES(‘Gizmo’) INSERT INTO Expensive-Product VALUES(‘Gizmo’)

129 129 CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ Updating Views INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) Purchase(customer, product, store) Product(pname, price) Updateable view Dan Suciu -- CSEP544 Fall 2011

130 130 CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ CREATE VIEW AcmePurchase AS SELECT customer, product FROM Purchase WHERE store = ‘AcmeStore’ Updating Views INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO AcmePurchase VALUES(‘Joe’, ‘Gizmo’) INSERT INTO Purchase VALUES(‘Joe’,’Gizmo’,NULL) INSERT INTO Purchase VALUES(‘Joe’,’Gizmo’,NULL) Note this Purchase(customer, product, store) Product(pname, price) Updateable view Dan Suciu -- CSEP544 Fall 2011

131 131 Updating Views INSERT INTO CustomerPrice VALUES(‘Joe’, 200) ? ? ? ? ? Non-updateable view Most views are non-updateable CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname CREATE VIEW CustomerPrice AS SELECT x.customer, y.price FROM Purchase x, Product y WHERE x.product = y.pname Purchase(customer, product, store) Product(pname, price) Dan Suciu -- CSEP544 Fall 2011

132 132 Incremental View Update Also known as view synchronization Immediate synchronization = after each update Deferred synchronization –Lazy = at query time –Periodic –Forced = manual

133 133 CREATE VIEW FullOrder AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid CREATE VIEW FullOrder AS SELECT x.cid,x.pid,x.date,y.name,y.price FROM Order x, Product y WHERE x.pid = y.pid Incremental View Update UPDATE Product SET price = price / 2 WHERE pid = ‘12345’ UPDATE Product SET price = price / 2 WHERE pid = ‘12345’ Order(cid, pid, date) Product(pid, name, price) UPDATE FullOrder SET price = price / 2 WHERE pid = ‘12345’ UPDATE FullOrder SET price = price / 2 WHERE pid = ‘12345’ No need to recompute the entire view ! Dan Suciu -- CSEP544 Fall 2011

134 134 CREATE VIEW Categories AS SELECT DISTINCT category FROM Product CREATE VIEW Categories AS SELECT DISTINCT category FROM Product Incremental View Update DELETE Product WHERE pid = ‘12345’ DELETE Product WHERE pid = ‘12345’ Product(pid, name, category, price) DELETE Categories WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’) DELETE Categories WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’) It doesn’t work ! Why ? How can we fix it ?

135 135 CREATE VIEW Categories AS SELECT category, count(*) as c FROM Product GROUP BY category CREATE VIEW Categories AS SELECT category, count(*) as c FROM Product GROUP BY category Incremental View Update DELETE Product WHERE pid = ‘12345’ DELETE Product WHERE pid = ‘12345’ Product(pid, name, category, price) UPDATE Categories SET c = c-1 WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’); DELETE Categories WHERE c = 0 UPDATE Categories SET c = c-1 WHERE category in (SELECT category FROM Product WHERE pid = ‘12345’); DELETE Categories WHERE c = 0

136 136 Answering Queries Using Views [See the paper] We have several materialized views: –V1, V2, …, Vn Given a query Q –Answer it by using views instead of base tables Variation: Query rewriting using views –Answer it by rewriting it to another query first Example: if the views are indexes, then we rewrite the query to use indexes Dan Suciu -- CSEP544 Fall 2011

137 Rewriting Queries Using Views 137 Purchase(buyer, seller, product, store) Person(pname, city) CREATE VIEW SeattleView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x.pname = y.buyer CREATE VIEW SeattleView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x.pname = y.buyer SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ Goal: rewrite this query in terms of the view Have this materialized view:

138 Rewriting Queries Using Views 138 SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’ SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’ Dan Suciu -- CSEP544 Fall 2011

139 Rewriting is not always possible 139 CREATE VIEW DifferentView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y, Product z WHERE x.city = ‘Seattle’ AND x.pname = y.buyer AND y.product = z.name AND z.price < 100 CREATE VIEW DifferentView AS SELECT y.buyer, y.seller, y.product, y.store FROM Person x, Purchase y, Product z WHERE x.city = ‘Seattle’ AND x.pname = y.buyer AND y.product = z.name AND z.price < 100 SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT y.buyer, y.seller FROM Person x, Purchase y WHERE x.city = ‘Seattle’ AND x..pname = y.buyer AND y.product=‘gizmo’ SELECT buyer, seller FROM DifferentView WHERE product= ‘gizmo’ SELECT buyer, seller FROM DifferentView WHERE product= ‘gizmo’ “Maximally contained rewriting”


Download ppt "Lecture 03: Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall 20111."

Similar presentations


Ads by Google