Download presentation
Presentation is loading. Please wait.
Published byEsmond Melton Modified over 8 years ago
1
1 Normalization Theory
2
2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Normalization theory provides a mechanism for analyzing and refining the schema produced by an E-R design
3
3 Redundancy Dependencies between attributes cause redundancy –Ex. All addresses in the same town have the same zip code SSN Name Town Zip 1234 Joe Stony Brook 11790 4321 Mary Stony Brook 11790 5454 Tom Stony Brook 11790 …………………. Redundancy
4
4 Redundancy and Other Problems Set valued attributes in the E-R diagram result in multiple rows in corresponding table PersonExample: Person (SSN, Name, Address, Hobbies) Person –A person entity with multiple hobbies yields multiple rows in table Person Hence, the association between Name and Address for the same person is stored redundantly –SSN is key of entity set, but (SSN, Hobby) is key of corresponding relation PersonThe relation Person can’t describe people without hobbies
5
5 Example SSN Name Address Hobby 1111 Joe 123 Main biking 1111 Joe 123 Main hiking ……………. SSN Name Address Hobby 1111 Joe 123 Main {biking, hiking} ER Model Relational Model Redundancy
6
6 Anomalies Redundancy leads to anomalies: –Update anomaly: A change in Address must be made in several places –Deletion anomaly: Suppose a person gives up all hobbies. Do we have to: Set Hobby attribute to null? No, since Hobby is part of key Delete the entire row? No, since we lose other information in the row –Insertion anomaly: Hobby value must be supplied for any inserted row since Hobby is part of key
7
7 Decomposition PersonSolution: use two relations to store Person information –Person1 –Person1 (SSN, Name, Address) –Hobbies –Hobbies (SSN, Hobby) The decomposition is more general: people with hobbies can now be described No update anomalies: –Name and address stored once –A hobby can be separately supplied or deleted
8
8 Normalization Theory Result of E-R analysis need further refinement Appropriate decomposition can solve problems normalization theoryfunctional dependencies multi-valued dependenciesThe underlying theory is referred to as normalization theory and is based on functional dependencies (and other kinds, like multi-valued dependencies) Functional DependencyFunctional Dependency –Main tool for formally measuring the appropriateness of attribute groupings into relation schema
9
9 Functional Dependencies functional dependencyDefinition: A functional dependency (FD), denoted by X Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation instance r of R. The constraint states that, for any two tuples t1 and t2 in r such that t1[X] = t2[X], we must also have t1[Y] = t2[Y] This means that the values of the Y component of a tuple in r depend on the values of the X component. –Key constraint is a special kind of functional dependency: all attributes of relation occur on the right-hand side of the FD: SSN SSN, Name, Address
10
10 Functional Dependencies - Example Address ZipCode –Stony Brook’s ZIP is 11733 –“given a value of Address, we know the value of ZipCode” ArtistName BirthYear –Picasso was born in 1881 Autobrand Manufacturer, Engine type –Pontiac is built by General Motors with gasoline engine Author, Title PublDate –Shakespeare’s Hamlet published in 1600
11
11 Entailment, Closure, Equivalence entailsDefinition: If F is a set of FDs on schema R and f is another FD on R, then F entails f if every instance r of R that satisfies every FD in F also satisfies f –Ex: F is {A B, B C} and f is A C If Streetaddr Town and Town Zip then Streetaddr Zip closureDefinition: The closure of F, denoted F +, is the set of all FDs entailed by F equivalentDefinition: F and G are equivalent if F entails G and G entails F
12
12 Entailment (cont’d) Satisfaction, entailment, and equivalence are semantic concepts – defined in terms of the actual relations in the “real world”. –They define what these notions are, not how to compute them How to check if F entails f or if F and G are equivalent? –Apply the respective definitions for all possible relations? Bad idea: might be infinite number for infinite domains Even for finite domains, we have to look at relations –Solution: find algorithmic, syntactic ways to compute these notions Important: The syntactic solution must be “correct” with respect to the semantic definitions soundnesscompletenessCorrectness has two aspects: soundness and completeness – see later
13
13 Armstrong’s Axioms for FDs This is the syntactic way of computing/testing the various properties of FDs Reflexivity: If Y X then X Y (trivial FD) –A set of attributes always determines itself –Name, Address Name Augmentation: If X Y then X Z YZ –If Town Zip then Town, Name Zip, Name –Also, Augmentation: If X Y then X Z Y (reflexivity) Transitivity: If X Y and Y Z then X Z
14
14 Additional Inference Rules for FDs Decomposition: If X YZ then X Z –If SSN Name, Age then SSN Age Additive : If X Y and X Z then X YZ Pseudo-transitive : If X Y and WY Z then WX Z –( because of Additive WX WY )
15
15 Soundness soundAxioms 1 to 3 are sound: If an FD f: X Y can be derived from a set of FDs F using the axioms, then f holds in every relation that satisfies every FD in F. Example: Given X Y and X Z then –Thus, X Y Z is satisfied in every relation where both X Y and X Z are satisfied Additive ruleTherefore, we have derived the Additive rule for FDs: we can take the union of the RHSs of FDs that have the same LHS X XY Augmentation by X YX YZ Augmentation by Y X YZ Transitivity (XY == YX)
16
16 Completeness completeAxioms 1 to 3 are complete: If F entails f, then f can be derived from F using the axioms A consequence of completeness is the following (naïve) algorithm to determining if F entails f: –Algorithm –Algorithm: Use the axioms in all possible ways to generate F + (the set of possible FD’s is finite so this can be done) and see if f is in F +
17
17 Correctness The notions of soundness and completeness link the syntax (Armstrong’s axioms) with semantics (the definitions in terms of relational instances) This is a precise way of saying that the algorithm for entailment based on the axioms is “correct” with respect to the definitions
18
18 Generating F + F AB C AB BCD A D AB BD AB BCDE AB CDE D E BCD BCDE Thus, AB BD, AB BCD, AB BCDE, and AB CDE are all elements of F + union aug trans aug decomp
19
19 Attribute Closure Calculating attribute closure leads to a more efficient way of checking entailment attribute closureThe attribute closure of a set of attributes, X, with respect to a set of functional dependencies, F, (denoted X + F ) is the set of all attributes, A, such that X A –X + F1 is not necessarily the same as X + F2 if F1 F2 Attribute closure and entailment: –Algorithm –Algorithm: Given a set of FDs, F, then X Y if and only if X + F Y
20
20 Example - Computing Attribute Closure F: AB C A D D E AC B X X F + A {A, D, E} AB {A, B, C, D, E} (Hence AB is a key) B {B} D {D, E} Is AB E entailed by F? Yes Is D C entailed by F? No Result: X F + allows us to determine FDs of the form X Y entailed by F
21
21 Computation of Attribute Closure X + F closure := X; // since X X + F repeat old := closure; if there is an FD Y Z in F such that Y closure and Z closure then closure := closure Z until old = closure – If T closure then X T is entailed by F
22
22 Example: Computation of Attribute Closure AB C (a) A D (b) D E (c) AC B (d) Problem: Compute the attribute closure of AB with respect to the set of FDs : Initially closure = {AB} Using (a) closure = {ABC} Using (b) closure = {ABCD} Using (c) closure = {ABCDE} Solution:
23
23 Denormalization Tradeoff: Judiciously introduce redundancy to improve performance of certain queries TranscriptExample: Add attribute Name to Transcript –Join is avoided Transcript –If queries are asked more frequently than Transcript is modified, added redundancy might improve average performance Transcript –But, Transcript is no longer in BCNF since key is (StudId, CrsCode, Semester) and StudId Name SELECT T.Name, T.Grade Transcript FROM Transcript T WHERE T.CrsCode = ‘CS305’ AND T.Semester = ‘S2002’
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.