The Relational Data Model. Why Relational Model? Most of the current DBMS are based on the relational data model: –simplicity –mathematically based: expressions.

Slides:



Advertisements
Similar presentations
Spring 2011 Instructor: Hassan Khosravi
Advertisements

Normalization CMSC 461 Michael Wilson. Anomalies  Poor relational database design can lead to the occurrence of anomalies  Anomalies that we tend to.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 6 A First Course in Database Systems.
Functional Dependencies - Example
Database Systems The Relational Data Model
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Design Principles: Faithfulness
Design Principles: Faithfulness
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
The Relational Data Model Database Model (E/R) Relational Schema Physical storage Diagrams (E/R) Tables: row names: attributes rows: tuples Complex file.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
CMSC424: Database Design Instructor: Amol Deshpande
Movies length titleyearfilmType Voices isa Cartoons isa MurderMystery weapon toStar Our Movie Example.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
From E/R Diagrams to Relations. The Relational Data Model Database Model (E/R) Relational Schema Physical storage Diagrams (E/R) Tables: row names: attributes.
The Entity-Relationship Data Model
Multivalued Dependencies. Intuition Redundancy: addresses, title repeated several times –because a star might have several addresses and stars in several.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 The Entity-Relationship Data Model Chapter 2 (Database Design)
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Functional Dependencies and Relational Schema Design.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Multi-valued Dependencies and Fourth Normal Form
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Conversion E/R to Relational CIS 4301 Lecture Notes Lecture 6 - 1/31/2006.
© D. Wong Ch. 2 Entity-Relationship Data Model (continue)  Data models  Entity-Relationship diagrams  Design Principles  Modeling of constraints.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
CS 157B Database Systems Dr. T Y Lin. Updates 1.Red color denotes updated data (ppt) 2.Class participation will be part of “extra” credits to to “quiz.
The Entity-Relationship Model CIS 4301 Lecture Notes 1/12/2006.
3 Spring Chapter Normalization of Database Tables.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Functional dependencies CMSC 461 Michael Wilson. Designing tables  Now we have all the tools to build our databases  How should we actually go about.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 222 Database Management System Spring Lecture 4 Database Design Theory Korra Sathya Babu Department of Computer Science NIT Rourkela.
Ch 7: Normalization-Part 1
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Databases : Functional Dependencies 2007, Fall Pusan National University Ki-Joune Li.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
© D. Wong Ch. 3 (part 1)  Relational Model basics  From E/R diagram to Relations.
High-level Database Models Prof. Yin-Fu Huang CSIE, NYUST Chapter 4.
Databases : Design of Relational Database Schemas 2007, Fall Pusan National University Ki-Joune Li.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Relational Database Design by Dr. S. Sridhar, Ph. D
3.1 Functional Dependencies
Design Theory for Relational Databases
Functional Dependencies and Relational Schema Design
Chapter 19 (part 1) Functional Dependencies
Lecture 6: Functional Dependencies
CS4222 Principles of Database System
Design Theory for Relational Databases
Presentation transcript:

The Relational Data Model

Why Relational Model? Most of the current DBMS are based on the relational data model: –simplicity –mathematically based: expressions (queries) can be analyzed by DBMS transformed to equivalent expressions automatically (query optimization)

Basics The relational data model is a particular way of structuring data (relations) Each relation is a two-dimensional table: titleyearlengthfilm type Star Wars color Mighty Ducks color Wayne’s World199295color The relation Movies

Attributes, Schemas, and Tuples titleyearlengthfilmType Star Wars color Mighty Ducks color Wayne’s World199295color Attributes Schema: Movies(title, year, length, filmType) Tuple   The relation Movies

Domains titleyearlengthfilmType Star Wars color Mighty Ducks color Wayne’s World199295color Attributes Schema: Movies(title, year, length, filmType) Tuple   The relation Movies Each attribute is associated to a domain.

Requirements Each component of a tuple is atomic (similar to atomic requirement in E/R diagram) Each attribute is associated with a domain

Equivalent representation of a Relation titleyearlengthfilmType Star Wars color Mighty Ducks color Wayne’s World199295color yearlengthtitlefilmType Star Warscolor Mighty Duckscolor Wayne’s Worldcolor IS THE SAME AS Movies(title,year,length,filmType) is equivalent to Movies(year,length, title,filmType)

Relation Instance A set of tuples for a relation is an instance. NOTE: An instance is not a schema. Instance changes when –a new tuple is added –a tuple is deleted –some attributes of a tuple are modified Relation schema does not often change but could be changed; when it changes lot of things need to be done – what?

Some Exercises acctNotypebalance 12345savings savings checking25 firstNamelastNameidNoaccount RobieBanks LenaHood LennonJohn What are the attributes? What are the tuples? Schema? Domain?

Steps in designing a database (Remember?) Analysis: –What information needs to be stored? –What are the relationships between different components of the stored information? –What is the suitable database structure (or schema)? Design the database structure (using a database design language or notation suitable for expressing design) Implementation in DBMS once committed to the design

From E/R Diagrams to Relational Designs Why ? –We often start out with a E/R diagram and then convert to a relational model How ? Entity sets Relationships Relation with the same set of attributes Relation are the keys of the connecting entity sets What about weak entity set and isa relationships? Can it be simplified? might be

Movies Stars_in Owns title year length film type Studios name address Stars nameaddress Basic instinct Drama Total recall Mystery Sharon Stonea1 Arnold Schwarzeneggera2 Basic instinct1990Sharon Stone Total recall1989Sharon Stone Total recall1989Arnold Schwarzenegger MoviesStars Stars_in Universal Studiob Dream Worlda Studio Basic instinct1990Universal Studio Total recall1989Universal Studio Owns Attributes, Schemas!!! Missing something ?

Entity sets to relations Entity set E with attributes a 1,…,a n which is not a weak entity set is converted to a schema E(a 1,…,a n ) Examples: – Movies(title,year,length,filmType) – Stars(name,address) – Studios(name,address)

Relationships to relations R is a relationship is converted to a relation with –key attributes of the involved entity sets (those connecting to R) –attributes of R Examples: – Stars_in(title,year,name) – Owns(title,year,name) What happens if an entity set E is involved in R more than one time? –the key attributes of E have to appeared as many times as E is involved in R; rename the attributes of E each time they are added

Example Movies Contracts Studios Stars Producing studio Studio of star Contracts(title,year,name,name_producing_studio,name_studio_of_star)

Combining Relations So far: rules for converting E/R diagrams to relations Sometimes: not optimal F R E AB CD Because C determines A  ER(C,D,A) F(A,B) R(A,C) E(C,D)

E is an entity set, R is a many-one relationship from E to F  E and R can be combined into one relation Attributes of the relation that combines E and R consists of –all attributes of E –the key attributes of F –all attributes of R This cannot be done if R is a many-many relationship Combining Relations

Weak Entity Sets to Relations W – weak entity set: three components –entity set W –relationship from W to another entity set (single border) –supporting relationship (double border) the relation corresponds to W have the following attributes: –attributes of W –attributes of other entity sets that help form the key of W if R is a relationship connecting W then attributes of R must contain the key attributes of W if R is a supporting relationship from W to another entity set then no relation corresponds to R is needed

Example Crews(number,name) & Studios(name,address) Crews Unit_of number Studios name address Unit_of(number,name,nameStudio) Redundant

Subclasses to Relations Assumptions about isa-hierarchy: –there is a root entity set for the hierarchy –the root’s key identifies every entity in the hierarchy –an entity set in the hierarchy might have attributes belonging to the different entity sets in the hierarchy Several choices: –E/R viewpoint: each entity set in the hierarchy corresponds to a relation –Object-oriented viewpoint: each subtree corresponds to a relation –Use null values: the whole tree corresponds to a relation

Movies Cartoons Murder Mysteries isa Children Story Books isa E/R: Relations correspond to Movies, Cartoons, Murder_Mysteries, Children, Story_Books Object-oriented: Relations correspond to the subtrees with the root: e.g. Cartoons Null values: One relation: Movies

E/R style conversion Movies(title,year,length,filmType) MurderMysteries(title,year,weapon) Cartoons(title,year) No relations for isa Movies titleyearlengthfilm type Cartoons Murder Mysteries isa weapon Voices Stars

OO conversion Movies(title,year,length,filmType) MoviesMurderMysteries(title,year,length,filmType,weapon) MoviesCartoons(title,year,length,filmType) MoviesAll(title,year,length,filmType,weapon) No relations for isa Movies titleyearlengthfilm type Cartoons Murder Mysteries isa weapon Voices Stars 4 subtrees: Movies Movies & MurderMysteries Movies & Cartoons All three

Null values Movies(title,year,length,filmType,weapon) No relations for isa Movies titleyearlengthfilm type Cartoons Murder Mysteries isa weapon Voices Stars

Comparison It depends, again - because each approach has its own advantages and disadvantages! Query answering: –Null values: good for queries related to several entity sets in the hierarchy –E/R: good for queries related to one single entity –OO: good for queries related to a subhierarchy Number of relations: null – only one, E/R – number of entities, OO – exponential Space: OO is worse, depending on specific situation: null more or less E/R

Relational Design So far: –how to get relational designs from E/R diagrams? –different approaches produce different relational designs –each design has advantages and disadvantages Question: –Can we do relational designs directly from applications? –Can we improve the relational designs? If yes, how? Yes, but much more difficult.

Improving Relational Designs Using functional dependencies: redesign of relations, remove redundancy Multivalued dependencies and integrity constraints: create good database schemas

Functional Dependencies (FD) Definition: A functional dependency on a relation R is a statement of the form “if two tuples of R agree on attributes A 1,…,A n then they must also agree on other attributes B 1,…,B m.” Notation: A 1 …A n  B 1 A 1 …A n  B 2 … A 1 …A n  B m Shorthand: A 1 …A n  B 1 …B m If the A’s of t equal the A’s of u then the B’s of t must equal the B’s of u A’s B’s t  u  FD in picture Left hand sideRight hand side

Example Some of the FDs: title year  lengthtitle year  studioName title year  filmType title year  starName ? Shorthand: title year  length filmType studioName NOTE: shorthand is used to combine FDs with the same left side. titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor Movies(title,year,length,filmType,studioName, starName) NO

Keys of Relations Definition: {A 1,…,A n } is a key for a relation R if –{A 1,…,A n } functionally determines all other attributes of R, and –no subset of {A 1,…,A n } functionally determines all other attributes of R. What is the key for Movies(title,year,length,filmType,studioName, starName) ? UNIQUENESS MINIMAL

Example FDs: title year  length filmType studioName Keys? Notation: underlining attributes to specify the primary key in relation schema. titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor Movies(title,year,length,filmType,studioName, starName) {title,year} {title,year,starName} NO, cannot functionally determine ‘starName’ YES

Superkeys A set of attributes that contains a key is called a superkey. –every key is itself a superkey –a superkey needs not be a key (not minimal)

Finding Keys for Relations Relations are obtained from E/R diagrams Key for a relation R can be determined as follows: –R is obtained from an entity set E: a key of E is a key for R –from a binary relationship: many-many from E to F: a key for R is the union of a key of E and a key of F many-one from E to F: a key of E is a key for R one-one from E to F: a key of E or a key of F can be used as a key for R –from a multiway relationship: situation dependent and need to be considered carefully

Example Movies Contracts Studios Stars Producing studio Studio of star Contracts(title,year,name,name_producing_studio,name_studio_of_star) What can be a key for Contracts? {title,year,name,name_producing_studio,name_studio_of_star} ? No: the first four attributes functionally determines the last one! Claim: if a multiway relationship has an arrow to the entity E then there exists a key for the relation that excludes the key of E.

Rules about Functional Dependencies FDs are good for database schema design  knowledge of FDs needed FDs determine the set of legal instances of a database schema Rules about FDs: allow us to reason about FDs –checking whether a FD holds –construct new FDs

Example Given R(A,B,C) with the FDs: A  B and A  C. –No instance of R can contain the two tuples (1,3,5) and (1,3,4) –No instance of R can contain two tuples with the same A –Does B  C hold as well? NO

Rules about FDs Two set of FDs S and T are equivalent if the set of relation instances satisfying S is exactly the set of relation instances satisfying T a set of FDs S follows from a set of FDs T if every relation instance that satisfies all the FDs in T also satisfies all the FDs in S

Rules about FDs A 1 …A n  B 1 …B m A 1 …A n  B 1 A 1 …A n  B 2 … A 1 …A n  B m Splitting Combining

Trivial FDs Given A 1 …A n  B 1 …B m –trivial: if B j  {A 1 …A n } for j=1,…,m –nontrivial: some B j does not belong to {A 1 …A n } –completely nontrivial: none of the B j belongs to {A 1 …A n } title year  title year title year  title length title year  length filmType

Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies S will also satisfies B 1 …B k  B}

The meaning of closure Assume that A 1,…,A n  B 1,…,B m holds Then, we know that A 1 …A n  B 1 will hold More: for every set {C 1,…,C k } which is a subset of {B 1,…,B m }, then A 1 …A n  C 1,…,C k If the A’s of t equal the A’s of u then the B’s of t must equal the B’s of u A’s B’s t  u  Cs This will mean that the C’s of must equal the C’s of u

Computing the closure Given: the set S and {A 1,…,A n } Compute: {A 1,…,A n } + - denote this set by X Step 1: X = {A 1,…,A n } Step 2: find a FD B 1 …B k  B in S such that {B 1,…,B k }  X and B  X, then X=X  {B} Step 3: repeat step 2 until nothing more can be added to X, then go to step 4 Step 4: return X

Example S = {AB  C, BC  AD, D  E, CF  B} Compute {A,B} + Step 1: X = {A,B} Step 2: X = X  {C}={A,B,C} because AB  C Step 3 back to step 2: X = X  {D} because BC  AD Step 3 back to step 2: X = X  {E} because D  E Step 3 back to step 2: nothing more Step 3 go to step 4: return {A,B,C,D,E}

Correctness of closure algorithm It computes true functional dependencies –proof: show that if B belongs to {A 1,…,A n } + then A 1 …A n  B holds. By induction over the number of steps (n) used in adding an attribute B into the set X n=0 then B belongs to {A 1,…,A n } and so A 1 …A n  B is a trivial functional dependency n  n+1: if B is added to X in the step n+1, then A 1 …A n  B j for all j by inductive hypothesis; this, together with B 1 …B k  B, implies that A 1 …A n  B It computes all functional dependencies –proof: show that if B does not belong to {A 1,…,A n } + then A 1 …A n  B does not hold. By constructing an instance I of the relation R such that the FD does not hold. A in the closure Others 111 … … … 111

Simple questions What is {A 1,…,A n } + if {A 1,…,A n } is a key of the relation? Can {A 1,…,A n } + ={A 1,…,A n }? Does {B 1,…,B m }  {A 1,…,A n } imply {B 1,…,B m } +  {A 1,…,A n } + ?

Transitive Rules GivenA 1,…,A n  B 1,…,B m B 1,…,B m  C 1,…,C k then A 1,…,A n  C 1,…,C k

Closing sets of FDs Given a set of FDs we can derive some other FDs using the rules about FDs (e.g. combining, splitting, and transitive) For a relation R, a set of FD is called a basis for R if all other FDs of R can be derived form it. A basis is minimal if none of its proper subsets is a basis.

Projecting FDs B’s R S Given: –R with a set of FDs F –S (a new relation) is obtained by removing the attributes {B 1,…,B m } from R Questions: What are the FDs of S? Answer: if A 1 …A n  C 1 …C k is a FD of R and none of the Bs appears on the left or right side ({B 1,…,B m }  {A 1,…,A n,C 1,…,C k }=  ) is a FD of S

Projecting - Example Given R(A,B,C,D) with the FDs A  B, B  C, and C  D. Remove the attribute B from R, we obtain a new relation S(A,C,D). What are the FDs of S? –A  C? –A  D? –C  D? We can compute this by: Compute all the closure of every subset of {A,C,D} by using the FDs of R that do not contain B.

Homework Consider a relation with schema R(A,B,C,D) and FD’s AB  C, C  D, and D  A. –What are all the nontrivial FD’s that follow from the given FD’s? List only the FDs with one attribute on the right? (5pt) –What are the keys of R? (5pt) –What are the superkeys but not keys? (5pt) Show that the following rule holds: (5pt) if A 1 …A n  B 1 …B m and C 1 …C k  D 1 …D t hold then A 1 …A n C 1 …C k  B 1 …B m D 1 …D t also holds.

For those whole like fun: Does the following hold: –if A  B then B  A –if AB  C and A  C then B  C A set of attributes is closed if X + =X. What are the FDs of a relation R(A,B,C,D) if –all sets of four attributes are closed –the only closed sets are {} and {A,B,C,D} –the closed sets are {}, {A,B}, {A,B,C,D} (note: the cases are considered separate) Stars: try the exercises with stars.

Design of Relational Database Schema titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor Some observations: value of studioName is the same in several tuples value of filmType is also repeated What wrong with it? redundancy  store the same value unnecessary several time update anormalies  an update might require several changes deletion anormalies  losing information if delete a value CAN WE AVOID THESE ANORMALIES?

Possible ways to avoid anormalies (Intuition) The bad way: start again (Oh, no!) The natural way: try to decompose the given relation into two or more relations that –contain the same information –avoid the anormalies

Example titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor titleyearlengthstudioNamefilmType Star Wars Foxcolor Mighty Ducks Disneycolor Wayne’s World Paramountcolor titleyearstarName Star Wars1997Mark Hamill Star Wars1997Harrison Ford Star Wars1977Carrie Fisher Mighty Ducks1991Emilio Estevez Wayne’s World1992Dana Carvey Wayne’s World1992Mike Meyers MovieStudioStar(title, year, length, studioName, starName, filmType) is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)

Decomposition Given a relation R with schema {A 1,…,A n }. A decomposition of R into two relations S and T with schemas {B 1,…,B m } and {C 1,…,C k }, respectively, such that 1.{A 1,…,A n } = {B 1,…,B m }  {C 1,…,C k } 2.The tuples in S are the projections onto {B 1,…,B m } of all the tuples in R. 3.The tuples in T are the projections onto {C 1,…,C k } of all the tuples in R.

Example – Projections titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor titleyearlengthstudioNamefilmType Star Wars Foxcolor Mighty Ducks Disneycolor Wayne’s World Paramountcolor titleyearstarName Star Wars1997Mark Hamill Star Wars1997Harrison Ford Star Wars1977Carrie Fisher Mighty Ducks1991Emilio Estevez Wayne’s World1992Dana Carvey Wayne’s World1992Mike Meyers MovieStudioStar(title, year, length, studioName, starName, filmType) is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName) How do we come up with this decomposition?

Boyce-Codd Normal Form (BCNF) BCNF: a relation R is in BCNF iff: whenever there is a nontrivial FD A 1 …A n  B for R, it is the case that {A 1,…,A n } is a superkey for R. Why this definition? Answer: if a relation is in BCNF then there is no anormaly. Example: MovieStudioStar(title, year, length, studioName, starName, filmType): not in BCNF MovieStudio(title, year, length, studioName, filmType): in BCNF StarsIn(title, year, starName): in BCNF

Decomposition into BCNF Suppose that we decompose a relation R into two relations S and T which are in BCNF. The requirements for S and T: –S and T is a decomposition of R –it is possible to reconstruct R from S and T Will every decomposition of R satisfy these two conditions? What are the FDs of the new relations?

Algorithm Given a relation R with the attributes {A 1,…,A n }. Step 1: For every nontrivial FD B 1 …B m  B if {B 1,…,B m } is a superkey then returns R (no decomposition is needed) Step 2: Takes a nontrivial FD B 1 …B m  B such that {B 1,…,B m } is a superkey, then decomposes R into two relations S and T with the following schema: –S’s schema: {B 1,…,B m } + –T’s schema: {B 1,…,B m }  ({A 1,…,A n }\{B 1,…,B m } + ) Repeat Step 1&2 for S and T until no decomposition is needed for every new relation; return the set of new relations as the result

Example The ‘new’ movie relation with the following attributes: {title,year,studioName,president,presAddress} (we call this set ALL) with the FDs: {title year  studioName, studioName  president, president  presAddress} Only one key: {title,year} studioName  president violated BCNF Step 2: takes studioName  president, decomposes into –S with the schema {studioName} + ={studioName,president,presAddress} –T with the schema {studioName,title,year}={studioName}  (ALL\ {studioName} + ) Check: {studioName,title,year} is in BCNF (the first two FDs) {studioName,president,presAddress} is not in BCNF Continue with the decomposition of S using president  presAddress and we get the following two relation schemas: {president,presAddress} and {president,studioName} both are in BCNF. The final result: {studioName,title,year}, {president,presAddress},{president,studioName}

Recovering information from a decomposition Suppose that R with the schema {A 1,…,A n } is decomposed into two relations S and T according to the algorithm whose attributes are {B 1,…,B m } + and {B 1,…,B m }  ({A 1,…,A n }\{B 1,…,B m } + ) The tuples of R can be obtained by joining all possible pairs of S and T where {B 1,…,B m } have the same values.

Recovering … t (R) t’ (S) t’’ (T) the B’s the rest of the closure others Projection Join {B1,…,Bm} {A1,…,An}\{B1,…,Bm} + {B1,…,Bm} + \ {B1,…,Bm}

Example – Decomposition and Recovering titleyearlengthstudioNamestarNamefilmType Star Wars FoxMark Hamillcolor Star Wars FoxHarrison Fordcolor Star Wars FoxCarrie Fishercolor Mighty Ducks DisneyEmilio Estevezcolor Wayne’s World199295ParamountDana Carveycolor Wayne’s World199295ParamountMike Meyerscolor titleyearlengthstudioNamefilmType Star Wars Foxcolor Mighty Ducks Disneycolor Wayne’s World Paramountcolor titleyearstarName Star Wars1997Mark Hamill Star Wars1997Harrison Ford Star Wars1977Carrie Fisher Mighty Ducks1991Emilio Estevez Wayne’s World1992Dana Carvey Wayne’s World1992Mike Meyers MovieStudioStar(title, year, length, studioName, starName, filmType) is not in BCNF is decomposed into 2 relations that are in BCNF: MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)

Some remarks The algorithm will stop and output a set of BCNF relations. Not every decomposition according to the algorithm is good The FD’s for the new relations are determined by ‘projecting’. If a decomposition is based on FDs (according to the algorithm) then the recovering process will give us exactly the original relation. If a decomposition is not based on FDs then we might not be able to recover the original relation from the new ones: –Example: R(A,B,C) with A  B and we decompose it into S(A,B) and T(B,C): ABC AB BC ABC

Third Normal Form (3NF) So far: if a relation is not in BCNF then anormalies arise. Given a relation Bookings with the attributes: –title: name of the movie –theater: name of the theater where the movie is being shown –city: the city where the theater is located (a tuple (m,t,c): represents the fact that movie m is shown at theater t in city c)

Bookings(title,theater,city) The FDs of the relations: –theater  city –title city  theater theater  city violates the BCNF condition, why? decomposition yields: {theater,city} and {theater,title} Consider the relations: theatercity GuildMenlo ParkMenlo theatertitle GuildNet ParkNet theatertitlecity GuildNetMenlo ParkNetMenlo recovering Violate the FD title city  theater Possible relations according to the FDs of each schema

3NF A relaxation of the BCNF condition: a relation R is in 3NF if: whenever there is a nontrivial FD A 1 …A n  B, either {A 1,…,A n } is a superkey or B is a member of some key. Bookings(title,theater,city) is in 3NF