Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
4NF and 5NF Prof. Sin-Min Lee Department of Computer Science.
1 Loss-Less Joins. 2 Decompositions uDependency-preservation property: enforce constraints on original relation by enforcing some constraints on resulting.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
4NF. PTypes Planes HasType Employees MServices Auth. MWorks Assignment AppliedOn States Dates PTypes(model, capacity,…) Planes(regno, model) Employees(sin,…)
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
1 Multi-valued Dependencies Salman Azhar Multi-valued Dependencies Fourth Normal Form These slides use some figures, definitions, and explanations from.
1 Multivalued Dependencies Fourth Normal Form Source: Slides by Jeffrey Ullman.
Winter 2002Arthur Keller – CS 18015–1 Schedule Today: Feb. 28 (TH) u Datalog and SQL Recursion, ODL. u Read Sections , Project Part 6.
1 Multivalued Dependencies Fourth Normal Form. 2 Definition of MVD uA multivalued dependency (MVD) on R, X ->->Y, says that if two tuples of R agree on.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
1 Multivalued Dependencies Fourth Normal Form Sources: Slides by Jeffrey Ullman book by Ramakrishnan & Gehrke.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
Multivalued Dependency Prof. Sin-Min Lee Department of Computer Science.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Multivalued Dependencies Fourth Normal Form. 2 A New Form of Redundancy uMultivalued dependencies (MVD’s) express a condition among tuples of a relation.
Fall 2001Arthur Keller – CS 1805–1 Schedule Today Oct. 9 (T) Multivalued Dependencies, Relational Algebra u Read Sections 3.7, Assignment 2 due.
Let remember from the previous lesson what is Knowledge representation
Logical Rules Recursion
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
4NF (Multivalued Dependency), and 5NF (Join Dependency)
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
3 Spring Chapter Normalization of Database Tables.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Final Review Zaki Malik November 20, Basic Operators Covered.
Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms: BCNF, Third Normal Form Introduction to Multivalued Dependencies.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
1 Database Design: DBS CB, 2 nd Edition Physical RDBMS Model: Schema Design and Normalization Ch. 3.
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
CPSC-310 Database Systems
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
CPSC-608 Database Systems
Semantics of Datalog With Negation
CPSC-310 Database Systems
Multivalued Dependencies & Fourth Normal Form (4NF)
Mulitvalued Dependencies
Multivalued Dependencies & Fourth Normal Form
Multivalued Dependencies & Fourth Normal Form
Logic Based Query Languages
Functional Dependencies
Datalog Inspired by the impedance mismatch in relational databases.
Multivalued Dependencies
Chapter 3: Multivalued Dependencies
Rules Programs Negation
Presentation transcript:

Databases 1 8th lecture

Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2

3 A New Form of Redundancy Multivalued dependencies (MVD’s) express a condition among tuples of a relation that exists when the relation is trying to represent more than one many-many relationship. Then certain attributes become independent of one another, and their values must appear in all combinations.

4 Example Drinkers(name, addr, phones, beersLiked) A drinker’s phones are independent of the beers they like. Thus, each of a drinker’s phones appears with each of the beers they like in all combinations. This repetition is unlike redundancy due to FD’s, of which name->addr is the only one.

5 Tuples Implied by Independence If we have tuples: Then these tuples must also be in the relation. nameaddrphonesbeersLiked Sueap1b1 Sueap2b2 Sueap1b2 Sueap2b1

6 Definition of MVD A multivalued dependency (MVD) X ->->Y is an assertion that if two tuples of a relation agree on all the attributes of X, then their components in the set of attributes Y may be swapped, and the result will be two tuples that are also in the relation.

7 Example The name-addr-phones-beersLiked example illustrated the MVD name->->phones and the MVD name ->-> beersLiked.

8 Picture of MVD X ->->Y XY others equal exchange

9 MVD Rules Every FD is an MVD. ▫If X ->Y is a FD, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples. ▫Therefore, the “new” tuples are surely in the relation, and we know X ->->Y. Complementation : If X ->->Y, and Z is all the other attributes, then X ->->Z.

10 Splitting Doesn’t Hold Like FD’s, we cannot generally split the left side of an MVD. But unlike FD’s, we cannot split the right side either --- sometimes you have to leave several attributes on the right side.

11 Example Consider a drinkers relation: Drinkers(name, areaCode, phone, beersLiked, manf) A drinker can have several phones, with the number divided between areaCode and phone (last 7 digits). A drinker can like several beers, each with its own manufacturer.

12 Example, Continued Since the areaCode-phone combinations for a drinker are independent of the beersLiked-manf combinations, we expect that the following MVD’s hold: name ->-> areaCode phone name ->-> beersLiked manf

13 Example Data Here is possible data satisfying these MVD’s: nameareaCodephonebeersLikedmanf Sue BudA.B. Sue WickedAlePete’s Sue BudA.B. Sue WickedAlePete’s But we cannot swap area codes or phones my themselves. That is, neither name ->-> areaCode nor name ->-> phone holds for this relation.

14 Fourth Normal Form The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.

15 4NF Definition A relation R is in 4NF if whenever X ->->Y is a nontrivial MVD, then X is a superkey. ▫“Nontrivial means that: 1.Y is not a subset of X, and 2.X and Y are not, together, all the attributes. ▫Note that the definition of “superkey” still depends on FD’s only.

16 BCNF Versus 4NF Remember that every FD X ->Y is also an MVD, X ->->Y. Thus, if R is in 4NF, it is certainly in BCNF. ▫Because any BCNF violation is a 4NF violation. But R could be in BCNF and not 4NF, because MVD’s are “invisible” to BCNF.

17 Decomposition and 4NF If X ->->Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF. 1.XY is one of the decomposed relations. 2.All but Y – X is the other.

18 Example Drinkers(name, addr, phones, beersLiked) FD: name -> addr MVD’s: name ->-> phones name ->-> beersLiked Key is {name, phones, beersLiked}. All dependencies violate 4NF.

19 Example, Continued Decompose using name -> addr: 1.Drinkers1(name, addr) ▫In 4NF, only dependency is name -> addr. 2.Drinkers2(name, phones, beersLiked) ▫Not in 4NF. MVD’s name ->-> phones and name ->-> beersLiked apply. No FD’s, so all three attributes form the key.

20 Example: Decompose Drinkers2 Either MVD name ->-> phones or name ->-> beersLiked tells us to decompose to: ▫Drinkers3(name, phones) ▫Drinkers4(name, beersLiked)

21 DATALOG: Logic As a Query Language If-then logical rules have been used in many systems. ▫Most important today: EII (Enterprise Information Integration). Nonrecursive rules are equivalent to the core relational algebra. Recursive rules extend relational algebra --- have been used to add recursion to SQL-99.

22 A Logical Rule Our first example of a rule uses the relations Frequents(drinker,bar), Likes(drinker,beer), and Sells(bar,beer,price). The rule is a query asking for “happy” drinkers -- - those that frequent a bar that serves a beer that they like.

23 Anatomy of a Rule Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Body = “antecedent” = AND of subgoals. Head = “consequent,” a single subgoal Read this symbol “if”

24 Subgoals Are Atoms An atom is a predicate, or relation name with variables or constants as arguments. The head is an atom; the body is the AND of one or more atoms. Convention: Predicates begin with a capital, variables begin with lower-case.

25 Example: Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are variables

26 Interpreting Rules A variable appearing in the head is called distinguished ; otherwise it is nondistinguished. Rule meaning: The head is true of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true.

27 Example: Interpretation Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Distinguished variable Nondistinguished variables Interpretation: drinker d is happy if there exist a bar, a beer, and a price p such that d frequents the bar, likes the beer, and the bar sells the beer at price p.

28 Arithmetic Subgoals In addition to relations as predicates, a predicate for a subgoal of the body can be an arithmetic comparison. ▫We write such subgoals in the usual way, e.g.: x < y.

29 Example: Arithmetic A beer is “cheap” if there are at least two bars that sell it for under $2. Cheap(beer) <- Sells(bar1,beer,p1) AND Sells(bar2,beer,p2) AND p1 < 2.00 AND p2 bar2

30 Negated Subgoals We may put NOT in front of a subgoal, to negate its meaning. Example: Think of Arc(a,b) as arcs in a graph. ▫S(x,y) says the graph is not transitive from x to y ; i.e., there is a path of length 2 from x to y, but no arc from x to y. S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y)

31 Safe Rules A rule is safe if: 1.Each distinguished variable, 2.Each variable in an arithmetic subgoal, 3.Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. We allow only safe rules.

32 Example: Unsafe Rules Each of the following is unsafe and not allowed: 1.S(x) <- R(y) 2.S(x) <- R(y) AND NOT R(x) 3.S(x) <- R(y) AND x < y In each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation.

33 Algorithms for Applying Rules Two approaches: 1.Variable-based : Consider all possible assignments to the variables of the body. If the assignment makes the body true, add that tuple for the head to the result. 2.Tuple-based : Consider all assignments of tuples from the non-negated, relational subgoals. If the body becomes true, add the head’s tuple to the result.

34 Example: Variable-Based S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Arc(1,2) and Arc(2,3) are the only tuples in the Arc relation. Only assignments to make the first subgoal Arc(x,z) true are: 1.x = 1; z = 2 2.x = 2; z = 3

35 Example: Variable-Based; x=1, z=2 S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) is the only value of y that makes all three subgoals true. Makes S(1,3) a tuple of the answer

36 Example: Variable-Based; x=2, z=3 S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) No value of y makes Arc(3,y) true. Thus, no contribution to the head tuples; S = {(1,3)}

37 Tuple-Based Assignment Start with the non-negated, relational subgoals only. Consider all assignments of tuples to these subgoals. ▫Choose tuples only from the corresponding relations. If the assigned tuples give a consistent value to all variables and make the other subgoals true, add the head tuple to the result.

38 Example: Tuple-Based S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Arc(1,2), Arc(2,3) Four possible assignments to first two subgoals: Arc(x,z)Arc(z,y) (1,2) (1,2) (1,2) (2,3) (2,3) (1,2) (2,3) Only assignment with consistent z-value. Since it also makes NOT Arc(x,y) true, add S(1,3) to result.

39 Datalog Programs A Datalog program is a collection of rules. In a program, predicates can be either 1.EDB = Extensional Database = stored table. 2.IDB = Intensional Database = relation defined by rules. wNever both! No EDB in heads.

40 Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.

41 Example: Datalog Program Using EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell. JoeSells(b) <- Sells(’Joe’’s Bar’, b, p) Answer(m) <- Beers(b,m) AND NOT JoeSells(b)

42 Expressive Power of Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. ▫The same as SQL select-from-where, without aggregation and grouping. But with recurson, Datalog can express more than these languages. Yet still not Turing-complete.

43 Recursive Example EDB: Par(c,p) = p is a parent of c. Generalized cousins: people with common ancestors one or more generations back: Sib(x,y) y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

44 Definition of Recursion Form a dependency graph whose nodes = IDB predicates. Arc X ->Y if and only if there is a rule with X in the head and Y in the body. Cycle = recursion; no cycle = no recursion.

45 Example: Dependency Graphs Cousin Sib Answer JoeSells Recursive Nonrecursive

46 Evaluating Recursive Rules The following works when there is no negation: 1.Start by assuming all IDB relations are empty. 2.Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new IDB. 3.End when no change to IDB.

47 The “Naïve” Evaluation Algorithm Start: IDB = 0 Apply rules to IDB, EDB Change to IDB? no yes done

48 Example: Evaluation of Cousin We’ll proceed in rounds to infer Sib facts (red) and Cousin facts (green). Remember the rules: Sib(x,y) y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

49 Seminaive Evaluation Since the EDB never changes, on each round we only get new IDB tuples if we use at least one IDB tuple that was obtained on the previous round. Saves work; lets us avoid rediscovering most known facts. ▫A fact could still be derived in a second way.

50 Par Data: Parent Above Child adbcefghjkiadbcefghjki

51 Recursion Plus Negation “Naïve” evaluation doesn’t work when there are negated subgoals. In fact, negation wrapped in a recursion makes no sense in general. Even when recursion and negation are separate, we can have ambiguity about the correct IDB relations.

52 Stratified Negation Stratification is a constraint usually placed on Datalog with recursion and negation. It rules out negation wrapped inside recursion. Gives the sensible IDB relations when negation and recursion are separate.

53 Problematic Recursive Negation P(x) <- Q(x) AND NOT P(x) EDB: Q(1), Q(2) Initial:P = { } Round 1:P = {(1), (2)} Round 2:P = { } Round 3:P = {(1), (2)}, etc., etc. …

54 Strata Intuitively, the stratum of an IDB predicate P is the maximum number of negations that can be applied to an IDB predicate used in evaluating P. Stratified negation = “finite strata.” Notice in P(x) <- Q(x) AND NOT P(x), we can negate P an infinite number of times deriving P(x).

55 Stratum Graph To formalize strata use the stratum graph : ▫Nodes = IDB predicates. ▫Arc A ->B if predicate A depends on B. ▫Label this arc “–” if the B subgoal is negated.

56 Stratified Negation Definition The stratum of a node (predicate) is the maximum number of – arcs on a path leading from that node. A Datalog program is stratified if all its IDB predicates have finite strata.

57 Example P(x) <- Q(x) AND NOT P(x) --P

58 Another Example EDB = Source(x), Target(x), Arc(x,y). Rules for “targets not reached from any source”: Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x)

59 The Stratum Graph NoReach Reach -- Stratum 0: No – arcs on any path out. Stratum 1: <= 1 arc on any path out.

60 Models A model is a choice of IDB relations that, with the given EDB relations makes all rules true regardless of what values are substituted for the variables. ▫Remember: a rule is true whenever its body is false. ▫But if the body is true, then the head must be true as well.

61 Minimal Models When there is no negation, a Datalog program has a unique minimal model (one that does not contain any other model). But with negation, there can be several minimal models. The stratified model is the one that “makes sense.”

62 The Stratified Model When the Datalog program is stratified, we can evaluate IDB predicates lowest-stratum-first. Once evaluated, treat it as EDB for higher strata.

63 Example: Multiple Models Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) Source Target Target Arc Stratum 0: Reach(1), Reach(2) Stratum 1: NoReach(3)

64 Example: Multiple Models Reach(x) <- Source(x) Reach(x) <- Reach(y) AND Arc(y,x) NoReach(x) <- Target(x) AND NOT Reach(x) Source Target Target Arc Another model! Reach(1), Reach(2), Reach(3), Reach(4); NoReach is empty.