Download presentation
Presentation is loading. Please wait.
1
Schema Refinement What and why
Copyright © Curt Hill
2
What is a good schema? A good schema should:
Represent all the data needed Group the data into relations that make sense Have little or no redundancy Make common operations efficient Not just a common sense notion We have some objective ways of determining if a schema is indeed good Copyright © Curt Hill
3
Redundancy What is wrong with redundant data?
Space and access tradeoff Update anomaly One copy is changed and others not Insert anomaly An insertion requires that unrelated information also be inserted Delete anomaly Deleting something deletes unrelated information Copyright © Curt Hill
4
Normalization Design activities to preclude the redundancy and functional anomalies There are a series of normal forms that are contained within one another 5thNF=PJ 4thNF BCNF 3rdNF 2ndNF 1stNF implies or contains NF = Normal Form PJ = Project Join, form of 5thNF BC = Boyce-Codd BCNF is a slight strengthening of 3rdNF Copyright © Curt Hill
5
How we will do this? We will start with the simplest and work up to the most complicated Show how to determine the particular normal form Show what problems the next normal form solves The literature describes an 18th Normal Form We will stop at 5th Normal Form Warning: Mathematics ahead If there is no math, this is not science Copyright © Curt Hill
6
First Normal Form Default case in a relational database
Rectangular tables Fixed number of fields A file is not in 1stNF if it allows repeating groups Such as a variable number of fields A relational database may allow variable length field but that is an implementation consideration The field is considered atomic Copyright © Curt Hill
7
1st NF and non 1st NF 1013 Joe Smith Biology English 1043 Jon Smith
Not in 1st Normal Form Repeated Groups 1013 Joe Smith Biology English 1043 Jon Smith CIS 1152 Jane Jones Math 1st Normal Form 1013 Joe Smith Biology 1013 Joe Smith English 1043 Jon Smith CIS 1152 Jane Jones Math Copyright © Curt Hill
8
An example in 1st NF Attributes SID - numeric student ID
SNAME - student name LCODE - location (campus) STATUS - numeric status of the location CID - course ID (numeric) CNAME - course name SITE - location of the course GRADE - grade this student received Key is SID and CID Copyright © Curt Hill
9
A picture 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91
SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © Curt Hill
10
What problems exist? Twos:
Locations, student and course Names IDs Both of these depend on part but not all of the key Looks like two tables not one Table is in 1stNF but not 2ndNF Copyright © Curt Hill
11
Anomalies Update anomaly Insert anomaly Delete anomaly
Changing course number requires changing several records Changing the LCode requires several updates Insert anomaly We cannot have a student without their taking at least one class Delete anomaly Deleting first record destroys all that we know about 170 Copyright © Curt Hill
12
Problem again The real problem is that things like CName are not dependent on the entire key CName is dependent on CID Just part of the key We need to consider functional dependencies Copyright © Curt Hill
13
Functional Dependencies (FD)
If field A determines field B then B is functionally dependent on A In other words: if we know A we know B Notation: AB This is read: A determines B A does not have to be an atomic attribute Every field is functionally dependent on every candidate key Includes every field with uniqueness property Copyright © Curt Hill
14
Full Functional Dependency
Somewhat stronger than previous B is fully functionally dependent on A iff B is functionally dependent on A B is not functionally dependent on any subset of A If A is atomic FD = FFD Notation is A ↠ B Copyright © Curt Hill
15
Observations We cannot tell FDs by just looking at the data
We must understand the data relationships Small tables may have apparent FDs that were not actually FDs If every AB was projected onto its relation then A would be the key Each FD represents an integrity constraint Copyright © Curt Hill
16
Closure of a Set of FDs The closure (denoted F+) of a set F of FDs is a set that includes: All FDs Every FD that can be derived from the given FDs FDs obey some properties that allow us to find FDs implied by other FDs These properties are called Armstrong’s Axioms Copyright © Curt Hill
17
Armstrong’s Axioms There are three basic rules:
Reflexivity Augmentation Transitivity Two additional rules may be derived using these three Union Decomposition Copyright © Curt Hill
18
Reflexivity If Y is a subset of X then X Y
A set of fields determines all of its members Examples: A A AB B Trivial FDs are any FD where the right hand side is a subset of the left hand side Copyright © Curt Hill
19
Augmentation If X determines Y Then XZ determines YZ
It is always possible to add a field to both sides of a functional dependency Example: If A B then AC BC Copyright © Curt Hill
20
Transitivity If X determines Y and Y determines Z Then X determines Z
We can chain FDs together Example: If: A B B C C D then: A C A D Copyright © Curt Hill
21
Union If a field determines two separate fields it determines both of them together If X determines Y and X determines Z Then X determines YZ If: A B A C then: A BC Copyright © Curt Hill
22
A Example Suppose that a table has six fields: ABCDEF
The following dependencies exist: AC B C DE F AC How many dependencies can be derived? What dependencies are contained in the closure? Copyright © Curt Hill
23
Closure The closure is the union of any dependency that may be derived from the original set: AC B, C DE, F AC Reflexivity (AKA trivial) A A, B B, AB B, ABC C, … Augmentation CA ADE, ACD BD, … Transitive F B, F DE Copyright © Curt Hill
24
Keys and Dependencies A key is any set of fields that determine all other fields Either directly or transitively A candidate key must be minimal No field may be removed and stay a key In the above: The entire relation is a key by reflexivity but is not minimal F is the key – it determines every other field directly or using transitivity Super key: set of fields that contains a key Copyright © Curt Hill
25
Decomposition If a field determines two combined fields it determines both of them separately If X determines YZ Then X determines Y and X determines Z This is the reverse of Union If: A BC then: A B A C Copyright © Curt Hill
26
Decompositions Use projections to subdivide a table into several tables in order to move to a higher normal form However, can all projections be done without problems? No There are both lossless and lossy projections The kind of desired projections are called: lossless join decompositions This kind allows us to exactly reconstruct the original table Copyright © Curt Hill
27
Lossless Join Decomposition
How may we subdivide one relation into two without losing anything? There must be some attributes in common in the two tables Otherwise the relationship between a key and attribute is broken The decomposition is lossless if one of the attributes in common is a key of either table Copyright © Curt Hill
28
Lossless Decomposition Again
Let R be a set of fields in a relation F be a set of FDs that hold over R The decomposition of R into R1 and R2 is lossless if and only if either F+ contains either R1 R2 R1 or R1 R2 R2 The attributes in common must contain the key for R1 or the key for R2 Copyright © Curt Hill
29
Example Original Join is larger than original, some information lost S
D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two S P S1 P1 S2 P2 S3 P D P1 D1 P2 D2 D3 Copyright © Curt Hill
30
Why did that not work? The common field was P P is not the key
Recall: The functional dependencies cannot be determined from looking at the data The data may only show what is not an FD In this case either S or D or both could be the key Copyright © Curt Hill
31
Example Revisited This works now, but may not work, with more data.
Original Reconstructed the same as original S P D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two better tables S P S1 P1 S2 P2 S3 S D S1 D1 S2 D2 S3 D3 This works now, but may not work, with more data. Copyright © Curt Hill
32
Other Notes This generalizes to decomposing a table into more than two tables Decompose R1 into R1A and R1B We can then reconstruct R1 if needed From the viewpoint of lossless decomposition: The common fields must include the key, but may include other fields From the viewpoint of decomposing into higher normal forms: The common fields are usually only key fields Non-key fields are just redundant data Copyright © Curt Hill
33
Second Normal Form (2ndNF)
A table is in Second Normal Form if and only if It is in 1st NF and Every non-key attribute is fully functionally dependent on the whole key No partial dependencies Copyright © Curt Hill
34
Partial Dependencies XA X is part of key but not all of it
Violation of 2nd NF Copyright © Curt Hill
35
Student Table Our previous student table was 1stNF but not 2ndNF
The key is SID and CID LCODE is dependent on SID CNAME is dependent on CID The fix is projecting it into two (or more) tables This must be dependency preserving Copyright © Curt Hill
36
What dependencies? SIDSNAME SIDLCODE LCODESTATUS CIDCNAME
SID,CIDGRADE CIDSITE SID,CIDEverything Copyright © Curt Hill
37
Now what? The two piece key implies three tables:
One where SID is the key One where CID is the key One with both SID and CID as the key Each table has only fields dependent on the whole key Copyright © Curt Hill
38
Original 1NF Table 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC
SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © Curt Hill
39
New Relations Student SID SName LCode Status 21 Jones A1 1 32 Smith
Enroll Course SID CID Grade 21 170 89 32 160 68 91 385 76 62 CID CName Site 170 C Lit MCF 160 C++ RSC 385 DB I VNG Copyright © Curt Hill
40
The new schema is better
Used a three-way lossless join decomposition Now at Second Normal Form Lost some anomalies The insertion and deletion anomalies We may have a student without a class The update anomaly Changing a course title needs only one update One anomaly still exists: Changing LCode of one requires changing other LCodes as well More work to be done Copyright © Curt Hill
41
Finally Dependencies are mathematical concept
Strongly related to the concept of a key We can use dependencies to determine a table’s normal form Second, third and Boyce-Codd First is any rectangular table Second has no partial dependencies A 1NF table with a single field for a key must be in 2NF Copyright © Curt Hill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.