Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Relational Data Model Overview –introduced by E. F. Codd (1970) –strong theoretical foundations –simplicity –numerous commercial systems –has a single.

Similar presentations


Presentation on theme: "1 Relational Data Model Overview –introduced by E. F. Codd (1970) –strong theoretical foundations –simplicity –numerous commercial systems –has a single."— Presentation transcript:

1 1 Relational Data Model Overview –introduced by E. F. Codd (1970) –strong theoretical foundations –simplicity –numerous commercial systems –has a single data-modeling concept: relation a table of values (informally ) Each column in the table has a column header called an attribute. Each row is called a tuple. –loosely speaking, represents databases as a collection of relations and constraints

2 2 Formal Relational Concepts (1) Domain –A set of atomic (indivisible) values. Attribute –A name to suggest the meaning that a domain plays in a particular relation. –Each attribute A i has a domain dom(A i ) Relation schema –A relation name R and a set of attributes Ai that define the relation –Denoted by R(A 1, A 2,..., A n ) –(ex) Movie(title, year, length, filmType) Degree of a relation –the number of attributes

3 3 Formal Relational Concepts (2) Tuple t (of R(A 1, A 2, …,A n ) ) –A (ordered) set of values t = where each value v i is an element of dom(A i ). Also called an n-tuple. Relation instances, r(R) –A set of tuples –r(R) = {t 1, t 2,..., t m }, or alternatively –r(R)  dom(A 1 )  dom(A 2 ) ...  dom(A n ) Relational database schema –A set S of relation schemas that belong to the same database. –S = {R 1, R 2,..., R n }

4 4 Suppliers relation S S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens Relation Domain CITYSTATUSNAMES# Primary key S Cardinality Tuples Degree Attributes

5 5 Characteristics of Relations Ordering of tuples in a relation r(R) –The tuples are not considered to be ordered, even though they appear to be in the tabular form Ordering of attributes in a relation schema R (and of values within each tuple) is immaterial All values are considered atomic (indivisible). Relational databases do not allow repeating groups –repeating group: a column or combination of columns that contains several data values in row A special null value is used to represent missing value

6 6 Relational terminology

7 7 From ODL to relational schema Basically one relation for each class, and one attribute for each property interface Movie { attribute string title; attribute integer year; attribute integer length; attribute enum Film {color, blackAndWhite} filmtype; }; Movie(title, year, length, filmType)

8 8 Non-atomic attribute in class Remember, relational model allows atomic values to appear in relation As for record structures, expand it making one attribute of relation for each field of the structure interface Star { attribute string name; attribute Struct Addr {string street, string city} address; }; Star(name, street, city)

9 9 Set attribute One approach is to make one tuple for each value of a set-valued attribute interface Star { attribute string name; attribute Set address; }; Star(name, street, city)

10 10 Other type constructor (1) For a bag (multiset) –add to relation schema another attribute “count” representing the number of times that each element is a member of the bag interface Star { attribute string name; attribute Bag address; };

11 11 Other type constructor (2) For a list, add a new attribute “position”, indicating the position in the list A fixed-length array can be represented by attributes for each position in the array

12 12 Single-valued relationship Relational model does not support the notion of a pointer, so simulate the effect of pointers by values that represent the related objects Include key attribute of referenced class interface Movie {... attribute enum Film {color, blackAndWhite} filmtype; relationship Studio ownedBy inverse Studio::owns; }; // assume studioName is key of the Studio class

13 13 Multivalued relationship May represent a set of related objects by creating one tuple for each value Note there is redundancy

14 14 What if there is no key ODL permits two objects in a class to have exactly the same values for all properties Invent a new attribute that can be used as key

15 15 Relationship and its inverse In the ODL model, the relationship and its inverse are both needed ! –We cannot follow a pointer backwards However, representing both relationship and its inverse in relational schema is redundant ! –WHY ???

16 16 Steps in Logical Database Design using ER Model Approach Major Steps –identify entity types –identify relationships between entity types –determine appropriate attributes for entity and relationship types –convert ER diagram into the system dependent model Hierarchical model Network model Relational model Object-oriented model –Normalization

17 17 Regular entity type create a relation. for composite attribute, may include only the simple component. choose a primary key. Example –Movies(title, year, length, filmType) –Stars(name, address) or Stars(name, street, city) Movies title Stars-in year flimTypelength Stars addressname

18 18 Binary M:N relationship type R create a new relation S. include all attributes of R in S. include primary keys of the participating relations as foreign key in S Example –key of the Stars entity type: starName –key of the Movies entity type: title + year –key of the Studios entity type : StudioName –Owns(title, year, StudioName) –Stars-In(title, year, starName)

19 19 Regular binary 1:N or 1:1 relationship R For 1:1 or 1:N relationship types, we may not create a new relation optionally Let S, T participating entity types (S: N-side) –include the primary key of T as foreign key in S. –include all attributes of R in S. Example –Movies(title, year, length, filmType, StudioName) StudioName is a foreign key –Studios(name, address) –no schema for the Owns relationship

20 20 For multivalued attribute A create a new relation R that includes the multivalued attribute include the primary key attribute K of the relation that has A as an attribute –primary key: combination of A and K –Department(number, name) –DeptLocation(number, location) department namenumberlocation

21 21 N-ary relationship type (1) create a new relation S. include primary keys of participating entity types as foreign keys in S. include all attributes of n-ary relationship. primary key of S: usually a combination of all foreign keys –Contracts(starName, title, year, studioOfStar, producingStudio) Stars Contracts Movies Studios Studio of starProducing studio

22 22 N-ary relationship type (2) SUPPLIER SName SUPPLY Quantity PartNo PROJECT ProName PART SName PartNo ProName SName ProName PartNO Quantity FK PART SUPPLIER PROJECT SUPPLY

23 23 Weak entity type create a relation R include the primary key of the owner as foreign key in R primary key in R –primary key in the owner + partial key of the weak entity –Studios(name, addr) –Crews(number, StudioName) Note, there is no separate relation for identifying relationship type number name addr Crews Unit-of Studios

24 24 Converting subclasses to relations Recall the distinction –In ODL, an object belongs to exactly one class. An object inherits properties from all its superclasses but technically is not a member of the superclasses –In ER model, an entity belongs to several entity sets that are related by isa relationships. Thus, the linked entities together represent the object and give that object its properties - attributes and relationships

25 25 Relational representation of ODL subclasses Every subclass has its own relation that represents all the properties of that subclass including all its inherited properties interface Cartoon: Movie { relationship Set voices; }; interface MurderMystery: Movie { attribute string weapon; }; interface Cartoon-MurderMystery: Cartoon, MurderMystery { }; Movie(title, …, starName) Cartoon(title, …, starName, voices) MurderMystery(title, …, starName, weapon) Cartoon-MurderMystery(title, …, starName, voices, weapon)

26 26 Representing ISA in the relational model For each entity set, create a relation that has attributes of that entity set alone as well as key attributes of related entity sets There is no relation created for an isa relationship Comparison –ODL translation keeps all properties of an object together in one relation –ER translation repeats the key for an object once for each of the entity sets or relationships to which that entity belongs

27 27 –Movies(title, year, length, filmType) –MurderMysteries(title, year, weapon) –Cartoons(title, year) –Voices(title, year, starName) Why we need the Cartoon relation whose attributes are a subset of the Movies relation –How to represent a silent cartoon movie without Cartoon relation What about Cartoon-MurderMystery ? Voices Murder- Mysteries Cartoons isa toStars weapon Movies lengthtitleyearflimType

28 28 Using null values interface Cartoon: Movie { relationship Set voices; }; interface MurderMystery: Movie { attribute string weapon; }; interface Cartoon-MurderMystery: Cartoon, MurderMystery { }; Movie(title, year, length, filmType, studioName, starName, voice, weapon) –Use null value when not applicable !

29 29 Functional Dependency (FD) Definition –For any two tuples t1 and t2 in R such that t1[X] = t2[X], we must have t1[Y] = t2[Y] where X and Y are sets of attributes in relation R –The values of the X component of a tuple uniquely (functionally) determine the value of the Y component. Notation : X  Y FD is a property of the relation schema (intension) of R that should hold all relation instances (extensions) all the times Example –every key K always functionally determines any subset of attributes Y of R i.e. K  Y

30 30 Example on FD –title year  length –title year  filmType –title year  studioName –title year  length filmType studioName –title year  starName (false)

31 31 Key, superkey, primary key, candidate key Superkey of a relation schema R={A1, A2,... An} is a set of attributes S with the property that no two tuples t1 and t2 in any relation instance of R will have t1[S] = t2[S] Candidate key : a minimal superkey Candidate key, Primary key –If a relation schema has more than one minimal super key, each is called a candidate key –One of candidate keys are designated to be a primary key Example –(title, year, starName): key –(title, year, starName, length): superkey

32 32 Discovering keys for relations If a relation comes from an entity set then the key for the relation is the key attributes of this entity set or class If a relation R comes from a relationship –many-many: keys of both connected entity sets are the key attributes for R –many-one from entity set E1 to entity set E2: key attributes of E1 are key attributes of R, but those of E2 are not –one-one: key attributes for either of the connected entity sets are key attributes of R. If a multiway relationship R has an arrow to entity set E, then there is at least one key for the corresponding relation that excludes the key of E Movies(title, year, length, filmType) Stars(name, address) Owns(title, year, studioName)// many-one Stars-in(title, year, starName) // many-many

33 33 Keys for relations derived from ODL If there is no key at all of an ODL class, introduce an attribute that is a surrogate for the object identifier of objects There are certain cases in which the key attributes for the class are not a key for the relation –WHY ??? –In general, if the relation for C represents several multivalued relationships from C, then the keys for all the classes that these relationships connected to C must be added to the key for C –The result is the key for C relation

34 34 Rules for FDs splitting rule –We can replace a FD A 1 A 2 …A n  B 1 B 2 …B m by a set of FDs A 1 A 2 …A n  B i for i = 1, …m combining rule –We can replace a set of FDs A 1 A 2 …A n  B i for i = 1, …m by a single FD A 1 A 2 …A n  B 1 B 2 …B m A 1 A 2 …A n  B 1 B 2 …B m is –trivial if the B’s are a subset of the A’s –nontrivial if at least one the B’s is not among the A’s –completely nontrivial if none of the B’s is also one of the A’s We can always remove from the right side of a DF those attributes that appear on the left –title year  year length// the second “year” may be dropped

35 35 Armstrong's axioms (inference rules) To infer new dependencies from a (given) set F of dependencies i.e. F |= X  Y Let X,Y,Z be a set of attributes –Reflexive rule: if Y  X then X  Y –Augmentation rule: if X  Y then XZ  YZ –Transitive rule: if X  Y and Y  Z then X  Z Def: F + (closure of F) is the set of all dependencies which are logically implied by F A set of inference rules is complete if given the set F the rule allows us to determine all dependencies in F + A set of rule is sound if using them we cannot deduce any dependency not in F + Armstrong's axioms are sound and complete

36 36 Additional inference rules –Decomposition rule: if X  YZ then X  Y (proof) 1. X  YZ (given) 2. YZ  Y (reflexive) 3. X  Y (transitive) –Union rule: if X  Y and X  Z then X  YZ (proof)1. X  Y (given) 2. X  Z (given) 3. X  XY (augmentation on 1 with X) 4. XY  YZ (augmentation on 2 with Y) 5. X  YZ (transitive with 3 and 4) –Pseudotransitive rule: if X  Y and WY  Z then WX  Z (proof)1. X  Y (given) 2. WY  Z (given) 3. WX  WY (augmentation on 1 with W) 4. WX  Z (transitive with 2 and 3)

37 37 Closure of attributes X + (closure of X with respect to F) is the set of attributes that are functionally determined by X Algorithm (compute X + ) X + := X; repeat oldX + := X + ; for each functional dependency Y  Z in F do if Y  X then X + := X +  Z; until (oldX + = X + ); Closure Pushing Out Initial set of attributes

38 38 Example of attribute closure (1) R(A,B,C,D,E,G) FDs:1. AB  C2. D  EG 3. C  A4. BE  C 5. BC  D6. CG  BD 7. ACD  B8. CE  AG Let X = BD, then X 0 = BD X 1 = BDEG (by 2) X 2 = BCDEG (by 4) X 3 = ABCDEG (by 3,5,6,8) X 4 = X 3 Hence, X + = X 4 = ABCDEG X is a key

39 39 Example of attribute closure (2) R(A,B,C,D,E,F) FDs:1. AB  C2. BC  AD 3. D  E4. CF  B Let X = AB, then X 0 = AB X 1 = ABC (by 1) X 2 = ABCD (by 2) X 3 = ABCDE (by 3) X 4 = X 3 Hence, X + = X 4 = ABCDE X is not a key

40 40 Minimal cover Two sets of functional dependencies E and F are equivalent if E + = F + A functional dependency fd in F is redundant if (F - fd) + = F + F' is a nonredundant cover (or minimal cover, minimal base) of F if F' + = F + and F' contains no redundant functional dependencies Usually the closure of F (i.e. F + ) is too big to handle, so use the previous algorithm to detect redundant functional dependency –X  Y follows from set of dependencies S if and only if Y are in X + There exist several minimal covers for a set of FDs !

41 41 Minimal cover example (1) R(A,B,C,D,E,F) S:1. AB  C2. BC  AD 3. D  E4. CF  B Does AB  D follow from S ? –{A,B} + = ABCDE and D  {A,B} + –AB  D follows from S –AB  D is redundant Does D  A follow from S ? –{D} + = DE and A  {D} + –D  A does not follow from S –D  A is not redundant

42 42 Minimal cover example (2) R(A,B,C) S:1. A  B2. A  C 3. B  A4. B  C 5. C  A6. C  B 7. AB  C8. BC  A 9. AC  B Minimal cover 1 : {A  B, B  A, B  C, C  B} Minimal cover 2 : {A  B, B  C, C  A} Other minimal cover, too. Issue: How to get minimal cover ?

43 43 Normalization one phase in database design first proposed by E.F. Codd (1972) a process during which unsatisfactory relation schemas are decomposed by breaking into smaller relation schemas that possess desirable properties utilizes functional dependency, multivalued dependency and join dependency

44 44 Bad relational schema Anomalies –Insertion anomalies Cannot record filmType without starName –Deletion anomalies If we delete the last star, we also lose the movie info. –Modification (update) anomalies

45 45 Decomposing relations Given a relation R(A 1, A 2, …, A n ), we may decompose R into two relations S(B 1, B 2, …, B m ) and T(C 1, C 2, …, C k ) –{A 1, A 2, …, A n } = {B 1, B 2, …, B m }  {C 1, C 2, …, C k } –Tuples of relation S is projections onto {B 1, B 2, …, B m } of relation R –Similarly for relation T

46 46 Decomposition example

47 47 Normal forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce/Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

48 48 First normal form (1NF) The only attribute values permitted are single atomic (indivisible) Not allow a set, list, bag, etc. considered to be part of the formal definition of a relations nested relation concept (?)

49 49 Various functional dependencies Prime and nonprime attribute –an attribute of a relation schema R is called a prime attribute of R if it is a member of any candidate key of R –an attribute is called nonprime if it is not a prime attribute, i.e. not a member of any candidate key of R Full functional dependency –Y is said to be fully dependent on X if X  Y and Z \  Y for any X  Z –Y is fully dependent on X if and only if Y is functionally dependent on X, and- not functionally dependent on any proper subset of X Partial functional dependency –Y is said to be partially dependent on X if some attribute can be removed from X and the dependency still holds Transitive functional dependency

50 50 Second Normal Form (2NF) A relation schema R is in 2NF if it is in 1NF and every nonprime attribute A in R is fully functionally dependent on every key of R R(SSN, pNumber, hours, eName, pName, pLocation) SSN pNumber  hours SSN  eName pNumber  pName pLocation decomposed into R1(SSN, pNumber, hours) R2(SSN, eName) R3(pNumber, pName, pLocation)

51 51 Example for 1NF and 2NF FIRST(S#, P#, Status, City, Qty) S# P#  Status City Qty S#  City Status City  Status Anomalies –insertion: cannot record City for a supplier until he supplies something –modification: City for a supplier appears many times –deletion: deletion of last tuple for S# lose its City FIRST is not in 2NF, so decompose into SECOND(S#, Status, City) SP(S#, P#, Qty) S#  City Status City  Status S# P#  Qty

52 52 Third Normal Form (3NF) (1) A relation schema is in 3NF if it is in 2NF and no nonprime attribute of R is transitively dependent on any key EMP_DEPT(SSN, Ename, Bdate, Addr, D#, Dname, Dmgrssn) FDs:SSN  Ename Bdate Addr D# D#  Dname Dmgrssn Since Dname and Dmgrssn are transitively dependent on D# (not in 3NF) ED1(SSN, Ename, Bdate, Addr, D#) ED2(D#, Dname, Dmgrssn)

53 53 Third Normal Form (2) Alternatively –A relation schema R is in 3NF if whenever a nontrivial functional dependency X -->A holds in R, then either (a) X is a super key of R or (b) A is a prime attribute of R. Violation of (a) implies X is not a superset of any key i.e. –X could be nonprime, or in result, typical transitive dependency –X could be a proper subset of a key in result, have a partial dependency, that is not in 2NF ! Violation of (b) implies A is a nonprime attribute

54 54 Example for 2NF and 3NF SECOND(S#, Status, City) SP(S#, P#, Qty) S#  City Status City  Status Anomalies –insertion: cannot record new Status for a city without S# –modification : Status for a City appears in many tuples –deletion : delete only the second tuple for a particular City SECOND is not in 3NF, so decompose into SC(S#,City) CS(City, Status)

55 55 Boyce/Codd Normal Form (BCNF) A relation schema R is in BCNF if whenever a nontrivial functional dependency X  A holds in R, then X is a super key of R Difference with 3NF –Drop the second condition in 3NF that allows A to be prime if X is not a superkey

56 56 3NF and BCNF example (1) MovieStudio(title, year, length, filmType, studioName, studioAddr) title year  length filmType studioName studioName  studioAddr Key: {title, year} Hence, MovieStudio is not 3NF Decompose into MovieStudio1(title, year, length, filmType, studioName) MovieStudio2(studioName, studioAddr) Then we get a schema in BCNF.

57 57 BCNF example (2)

58 58 All binary relations are in BCNF Let A and B are all attributes Consider all possible cases, here there are totally 4 cases –no nontrivial FD at all: {A, B} is a key, so in BCNF –A  B holds, B  A does not hold {A} is a key, so in BCNF –B  A holds, A  B does not hold: similarly –A  B holds, B  A holds {A} and {B} are keys, so in BCNF Note that such dependencies are plausible

59 59 Why 3NF, not BCNF ??? Bookings(title, theater, city) theater  city title city  theater Two candidate keys: {title city}, {title theater} Bookings is not in BCNF, so decompose into {theater, city} and {theater, title} Consider following two instances When two relations are joined, “title city  theater” does not hold !

60 60 Recovering info. from decomposition We need to make sure that projections of the original tuples can be joined again to produce all and only the original tuples Now recover the original relation with join operation There are two spurious tuples Attribute B is not a key in either relation decomposed into

61 61 Lossless (non-additive) join To prevent spurious tuples from being generated when a natural join is applied Decomposition D = {R 1, …, R m } of R has the lossless join property wrt a set of FDs if for every legal state r of R  (r)  …   (r) = r

62 62 Property of lossless join D = {R 1, R 2 } of R has the lossless join property wrt a set of FDs if (R 1  R 2 )  (R 1 – R 2 ) or (R 1  R 2 )  (R 2 – R 1 ) That is, (R 1  R 2 ) is a key in R 1 or R 2 Example PCZ(phone, company, zip) phone  company zip  company Decomposed into PC(phone, company) ZC(zip, company) Since company ( = PC  ZC) is not a key, lossy join

63 63 Projecting FDs How we can find (new) FDs relevant to decomposed relation schema ? Suppose relation R is decomposed into relation S and other relation, and F is a set of FDs known to hold for R –Let X be a set of attributes that is contained in the set of attributes of S –Compute X + –For each attribute B such that B is an attribute of S B is in X + B is not in X –Then, FD X  B holds in S

64 64 Example 3.39 R(A, B, C, D), A  B, B  C Let S(A,C) be a decomposed relation of R Need to compute the closure of each subset of {A,C} Compute {A} + –{A} + = ABC –C is in S –so A  C holds for S Compute {C} + –{C} + = C, no new FD Compute {AC} + –{AC} + = ABC, no new FD Hence, A  C is the only non-trivial FD for S

65 65 Example 3.40 R(A, B, C, D, E), A  D, B  E, DE  C Let S(A, B, C) be a decomposed relation of R Need to compute the closure of each subset of {A, B, C} –Compute {A} + = AD, no new FD –Compute {B} + = BE, no new FD –Compute {C} + = C, no new FD –Compute {AB} + = ABCDE, so AB  C holds for S –Compute {BC} + = BCE, no new FD –Compute {AC} + = ACD, no new FD –Compute {ABC} + = ABCDE, no new FD Hence, AB  C is the only nontrivial FD for S

66 66 Dependency preservation A decomposition D = {R 1, …, R m } of R is dependency-preserving wrt a set F of FDs if (  F (R 1 )  …   F (R m )) + = F + where  F (R i ) denotes a set of FDs X  Y in F + such that all attributes in X  Y are contained in R i We do not want FDs to be lost in the decomposition Always possible to have a dependency-preserving decomposition D such that each R i in D is in 3NF Not always possible to find a decomposition that preserves dependencies into BCNF

67 67 Multivalued dependency example (1) EMP(eName, pName, depName) Smith{X,Y}{John, Anna} –Must repeat every combination due to 1NF –Two independent one-many relationships are mixed in the same relation –eName -->> pName eName -->> depName

68 68 Multivalued dependency example (2) interface Star { attribute string name; attribute Set address; relationship Set starredIn inverse Movie::stars; } Note that there are no nontrivial FDs, hence it is in BCNF

69 69 Multivalued dependency example (3) A 1 A 2 … A n -->> B 1 B 2 … B m holds if For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: –With both t and u on the A’s –With t on the B’s and –With u on all attributes of R that are not among the A’s or B’s In previous example, –first tuple: t –second tuple: u –then, the third tuple of the previous instance becomes v

70 70 Multivalued dependency definition Let X, Y be sets of attributes in R Let Z be compliment of X  Y Multivalued dependency (MVD) X -->> Y [X multidetermines Y] holds in R if and only if each X-value in R is associated with a set of Y-values in a way that does not depend on Z-values Formally, a MVD from X to Y, X -->> Y exists in R iff Yxz = Yxz’ for each X, Z, Z’ such that Yxz and Yxz’ are nonempty where Yxz = {y   R}

71 71 MVD and FD-MVD rules Complementation –If X -->> Y, then X -->> T – X – Y Augmentation –If X -->> Y and V  W, XW -->> YV Transitivity –If X -->> Y and Y -->>Z, then X -->> Z – Y If X  Y, then X -->> Y (i.e. an FD is a special case of an MVD) Coalescence rule –If X--> Y, Z  Y and for some W disjoint from Y we have W  Z, then X  Z

72 72 Fourth normal form (4NF) An MVD A 1 A 2 … A n -->> B 1 B 2 … B m for a relation R is trivial if: –A’s  B’s or –A’s  B’s are all attributes of R A relation R is in 4NF if whenever A 1 A 2 … A n -->> B 1 B 2 … B m is a nontrivial MVD, {A 1, A 2, …, A n } is a superkey

73 73 4NF example Star(name, street, city, title, year) name -->> street city Star relation is not in 4NF, hence decompose into Star1(name, street, city) Star2(name, title, year) Note that both new relations are in 4NF

74 74 Relationship among normal forms (1) In practice, it is best to have relation schemas in BCNF or in 3NF All relation schema 1NF 2NF 3NF BCNF 4NF 5NF

75 75 Relationship among normal forms (2)


Download ppt "1 Relational Data Model Overview –introduced by E. F. Codd (1970) –strong theoretical foundations –simplicity –numerous commercial systems –has a single."

Similar presentations


Ads by Google