Download presentation
Presentation is loading. Please wait.
Published byNancy Lisby Modified over 9 years ago
1
Chapter 8: Normal Forms Based on Functional Dependencies
Data Modeling and Database Design Chapter 8: Normal Forms Based on Functional Dependencies
2
Normalization Normalization is a technique that facilitates systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy. Normal Forms (NFs) provide a stepwise progression towards attaining the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause update anomalies from a functional dependency perspective. Chapter 8 – Normal Forms Based on Functional Dependencies
3
Desirable Versus Undesirable FDs
Desirable FDs in a relation schema R are those where the determinant is a candidate key of R. Undesirable FDs in a relation schema R are those where the determinant is not a candidate key of R. That is, the FDs will cause data redundancy and the consequent modification anomalies in R. Chapter 8 – Normal Forms Based on Functional Dependencies
4
Normal Forms: An Overview
A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria for that normal form. First normal form (1NF) reflects one of the properties of a relation schema – i.e., by definition a relation schema is in 1NF. The normal forms associated with functional dependencies are second (2NF), third (3NF), and Boyce-Codd (BCNF) normal forms. Chapter 8 – Normal Forms Based on Functional Dependencies
5
Normal Forms: An Overview (continued)
The violations of each of these normal forms signal the presence of a specific type of ‘undesirable’ FD. violation of a normal form, can be interpreted as equivalent to an inadvertent mixing up of entity types belonging to two different entity classes in a single entity type.
6
Normal Forms: The History
E.F. Codd first proposed the 1NF, 2NF, and 3NF in 1972. Later it was discovered that under certain conditions (i.e., FDs) a relation schema in 3NF continues to have data redundancies causing modification anomalies. A revised, stronger definition of the 3NF was then proposed by Boyce and Codd in 1974, which came to be known as Boyce-Codd normal form (BCNF). Chapter 8 – Normal Forms Based on Functional Dependencies
7
First Normal Form (1NF) First normal form (1NF) imposes conditions so that a base relation that is physically stored as a file does not contain records with a variable number of fields. This is accomplished by prohibiting multi-valued attributes, composite attributes, and combinations thereof in a relation schema. Such a constraint, in effect, prevents relations from containing other relations. In essence, 1NF, by definition, requires that the domain of an attribute must include only atomic values and that the value of an attribute in a relation’s tuple must be a single value from the domain of that attribute. Chapter 8 – Normal Forms Based on Functional Dependencies
8
1NF Violation – An Example
Note 1: Album_no is the primary key of ALBUM Note 2: Artist_nm is a multi-valued attribute causing a first normal form violation. Chapter 8 – Normal Forms Based on Functional Dependencies
9
Resolution of 1NF Violation
Chapter 8 – Normal Forms Based on Functional Dependencies
10
Second Normal Form (2NF)
A relation schema R is in 2NF if every non-prime attribute in R is fully functionally dependent on the primary key of R. This means a non-prime attribute is not functionally dependent on a proper subset of the primary key of R. The Second Normal Form (2NF) is based on a concept known as full functional dependency. A functional dependency of the form Z A is a ‘full functional dependency’ if and only if no proper subset of Z functionally determines A. In other words, if Z A and X A, and X is a proper subset of Z, then Z does not fully functionally determine A, i.e., Z A is not a full functional dependency; it is a partial dependency. Chapter 8 – Normal Forms Based on Functional Dependencies
11
Violation of 2NF Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. In order for a partial dependency to exist here, Attribute A must be a proper subset of the primary key of R1. Chapter 8 – Normal Forms Based on Functional Dependencies
12
2NF Violation - An Example
What is the cause of the 2NF violation in NEW_ALBUM? Chapter 8 – Normal Forms Based on Functional Dependencies
13
2NF Violation Explained
F: fd1: Album_no Price; fd2: Album_no Stock F+: F; fd12: Album_no {Price, Stock}; fd12x: {Album_no, Artist_nm} {Price, Stock} Fc = F Candidate Key of NEW_ALBUM: (Album_no, Artist_nm); Primary Key: (Album_no, Artist_nm) fd1 and fd2 (fd12) violate 2NF in NEW_ALBUM Chapter 8 – Normal Forms Based on Functional Dependencies
14
Modification Anomalies Resulting from 2NF Violation
Suppose we want to change the value of Price or Stock of Album_no BTL007 in NEW_ALBUM Multipe tuples require update and failure to update some will change the semantics of the scenario update anomaly Suppose we want to add a new tuple (Album_no: XY111, Price: 17.95, and Stock: 100) to NEW_ALBUM Unless we also know the artist(s) the insert is not possible since Artist_nm is part of the primary key of NEW_ALBUM insertion anomaly Suppose we want to delete Album_no BTL007 from NEW_ALBUM Entails deletion of multiple tuples deletion anomaly Chapter 8 – Normal Forms Based on Functional Dependencies
15
Resolution of 2NF Violation
The resolution of 2NF violation is a two-step process that decomposes the target relation schema with the undesirable FDs into multiple relation schemas such that the undesirable FDs are rendered desirable. Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s). Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema. Chapter 8 – Normal Forms Based on Functional Dependencies
16
Resolution of 2NF Violation Demonstrated
Note: Album_no Price Album_no Stock Note: No non-trivial FD present Chapter 8 – Normal Forms Based on Functional Dependencies
17
Third Normal Form – 3NF A relation schema R is in 3NF if it is in 2NF and no non-prime attribute is functionally dependent on another non-prime attribute in R. The Third Normal Form (3NF) is based on the concept of transitive dependency. Given a relation schema R (X, A, B) where X, A, and B are pair-wise disjoint atomic or composite attributes, X is the primary key of R, and A and B are non-prime attributes If A B (or B A) in R, then B (or A) is said to be ‘transitively dependent’ on X, the primary key of R. Chapter 8 – Normal Forms Based on Functional Dependencies
18
Violation of 3NF Note: X, Y, and Z can be atomic or composite attributes. Each is a non-prime attribute. Chapter 8 – Normal Forms Based on Functional Dependencies
19
Another Possible Violation of 3NF
Note: The primary key and the attributes A, X, Y, and Z can be atomic or composite. A is a prime attribute, whereas X, Y, and Z are non-prime attributes. The fact that the non-prime attribute Y is functionally dependent on the non-prime attribute X constitutes the third normal form violation. Chapter 8 – Normal Forms Based on Functional Dependencies
20
3NF Violation – An Example
Question: What is it about the following fds that allows FLIGHT to be free of a second normal form violation but contain a third normal form violation? fd1: Flight# Origin; fd2: Flight# Destination; fd3: (Origin, Destination) Mileage Chapter 8 – Normal Forms Based on Functional Dependencies
21
3NF Violation Explained
Given R (Flight#, Origin, Destination) and F prevailing over R where F: fd1: Flight# Origin; fd2: Flight# Destination; fd3: {Origin, Destination} Mileage F+: F; fd12: Flight# {Origin, Destination}; fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage} Fc = F Candidate Key of FLIGHT: Flight# Primary Key of FLIGHT: Flight# A 2NF violation requires presence of partial dependency Note: when the primary key is an atomic attribute, a partial dependency is impossible. fd3 violates 3NF in FLIGHT. Chapter 8 – Normal Forms Based on Functional Dependencies
22
Modification Anomalies Resulting from 3NF Violation
Suppose we want to add the information (Origin: Cincinnati, Destination: Houston, and Mileage: 1100) to FLIGHT It is not possible to do so without this route assigned to a Flight # insertion anomaly Suppose Flight# DL507 is removed from FLIGHT The information that Seattle to Denver is 1537 miles is inadvertently lost deletion anomaly
23
Resolution of 3NF Violation
The resolution of a 3NF violation is accomplished by applying the same two-step process used earlier to resolve the 2NF violation. The two-step process is: Pull out the undesirable FD(s) from the target relation schema as a separate relation schema. Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema. Chapter 8 – Normal Forms Based on Functional Dependencies
24
Resolution 3NF Violation Demonstrated
Note: fd1: Flight# Origin fd2: Flight# Destination Note: fd3: (Origin, Destination) Mileage Chapter 8 – Normal Forms Based on Functional Dependencies
25
Boyce-Codd Normal Form (BCNF)
Data redundancies and the consequent update anomalies due to functional dependencies may persist even after a relation schema R is normalized to 3NF Then, R is said to violate BCNF Condition conducive to modification anomalies in a 3NF Relation Schema if a relation schema has at least two candidate keys, if any two candidate keys are composite attributes, and if there is an attribute overlap between the two candidate keys Presence of above condition need not necessarily trigger modification anomalies or BCNF violation Chapter 8 – Normal Forms Based on Functional Dependencies
26
BCNF Defined A relation schema R is in BCNF if for every non-trivial functional dependency in R, the determinant is a superkey of R Remember: An FD in R is trivial if and only if the dependent is a subset of the determinant. Note: By this definition, violation of 2NF or 3NF also imply violation of BCNF Chapter 8 – Normal Forms Based on Functional Dependencies
27
Violation of BCNF Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. The fact that Y is a determinant here but not a candidate key of R4 constitutes the Boyce-Codd Normal Form violation. Chapter 8 – Normal Forms Based on Functional Dependencies
28
BCNF Violation – An Example
Question: What is it about the following fds that allows STU_SUB to be free of second and third normal form violations but contain a BCNF violation? fd1: (Stu#, Subject) Teacher fd2: (Stu#, Subject) Ap_score fd3: Teacher Subject Chapter 8 – Normal Forms Based on Functional Dependencies
29
BCNF Violation Explained
F: fd1: {Stu#, Subject} Teacher; fd2: {Stu#, Subject} Ap_score; fd3: Teacher Subject Candidate Keys of STU_SUB are: (Stu#, Subject); (Stu#, Teacher) Primary Key of STU_SUB: (Stu#, Subject) [Chosen for this example] fd3 violates BCNF since Teacher is not a candidate key of STU_SUB. Note: In the absence of fd3, the overlapping composite keys do not violate BCNF Chapter 8 – Normal Forms Based on Functional Dependencies
30
Modification Anomalies Resulting from BCNF Violation
Suppose we want to add a new Teacher for a Subject (e.g., Teacher: ‘Salter’, Subject: ‘English’) The addition is not possible without a Stu# associated with this insertion since Stu# is part of the primary key of STU_SUB insertion anomaly Suppose we want to replace Campbell with Smith Multiple tuples require update and failure to update even one changes the semantics of the scenario update anomaly Chapter 8 – Normal Forms Based on Functional Dependencies
31
Resolution of a BCNF Violation
The two-step process is: Pull out the undesirable FD(s) from the target relation schema as a separate relation schema. Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema. Chapter 8 – Normal Forms Based on Functional Dependencies
32
Resolution of BCNF Violation Demonstrated
Note: Teacher Subject Note: (Stu#, Teacher) Ap_score Chapter 8 – Normal Forms Based on Functional Dependencies
33
Possible Side Effects of Normalization
Attribute Preservation i.e., no attribute should be lost in the decomposition process. Dependency Preservation i.e., any decomposition should continue to preserve the minimal cover of F. Lossless-Join (Non-additive or Non-loss Join) Property i.e., the decomposition should be strictly reversible in that the reversal should strictly yield the original target relation in tact with no loss of tuples or addition of spurious tuples. Chapter 8 – Normal Forms Based on Functional Dependencies
34
Dependency Preservation Explained
Any decomposition, D, of a relation schema, R, should continue to preserve the minimal cover (Fc) of F prevailing over R i.e., the union of the FDs that hold on individual relation schemas of D should be a cover for F Each FD in Fc should either directly appear in single relation schemas in D or be inferable from the FDs that appear in single relation schemas in D If there is a need to join two or more relation schemas in D to ascertain an FD in Fc, then D is not a dependency-preserving solution Failure to preserve specified FDs amounts to failure to honor the semantics of the specified business rules
35
Dependency Preservation: An Example
FLIGHT is in violation of 3NF F: fd1: Flight# Origin; fd2: Flight# Destination; fd3: {Origin, Destination} Mileage F+: F; fd12: Flight# {Origin, Destination}; fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage} Chapter 8 – Normal Forms Based on Functional Dependencies
36
Dependency-Preserving Solution
D1 [R1, R2] (A 3NF Solution to R) where R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE (Origin, Destination, Mileage) Is D1 attribute-preserving? Yes, since the union of all attributes in the decomposition D1 is exactly the same as the attributes in R Is D1 dependency-preserving? Yes, since fd1 and fd2 appear explicitly in FLIGHT (R1) and fd3 appears explicitly in DISTANCE (R2). Chapter 8 – Normal Forms Based on Functional Dependencies
37
D1: A Dependency-Preserving Decomposition
Demonstrated Suppose we want to add a new flight (Flight#: DL111, Origin: Seattle, Destination: Denver, Mileage: 1300) Note: The shaded tuple in FLIGHT is a “legal” insertion. The Struck-out tuple in DISTANCE is an “illegal” insertion because a tuple with (Origin: Seattle, Destination: Denver) already exists in DISTANCE; and {Origin, Destination} is the primary key of DISTANCE. Observe that the addition of this tuple, if legal, will mean that the distance between the same two cities has two values thus failing to preserve fd3: {Origin, Destination} Mileage Chapter 8 – Normal Forms Based on Functional Dependencies
38
Three Possible 3NF Solutions for R
Given R (Flight#, Origin, Destination) and F prevailing over R where F: fd1: Flight# Origin; fd2: Flight# Destination; fd3: {Origin, Destination} Mileage Solutions D1: R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE (Origin, Destination, Mileage) Note: D1 is a dependency-preserving solution D2: R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A (Flight#, Mileage) D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)
39
Evaluation of the 3NF Solution D2 for Dependency Preservation
D2: R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A (Flight#, Mileage) Is this decompositions attribute preserving? Yes, since the union of all attributes in D2 is exactly the same as the attributes in R. Is the decomposition D2 dependency preserving? No. It is impossible to deduce fd3: (Origin, Destination) Mileage from D2 i.e., fd3 is not preserved in the solution D2 Chapter 8 – Normal Forms Based on Functional Dependencies
40
D2: A Non-Dependency-Preserving Decomposition
Suppose we want to add a new flight (Flight#: DL111, Origin: Seattle, Destination: Denver, Mileage: 1300) Note: Clearly, fd3: {Origin, Destination} Mileage is not preserved in in this decomposition since fd3 cannot be verified from either FLIGHT_A or DISTANCE_A. The consequence is demonstrated by the addition of tupes to FLIGHT_A and DESTINATION_A as desired. The shaded tuples in FLIGHT_A and DISTANCE_A are “legal” insertions. A join of the two tables reveals that the distance between the two cities Seattle and Denver has two values ( 1300 and 1537) thus proving the failure to preserve fd3 – i.e., the database has been contaminated. Chapter 8 – Normal Forms Based on Functional Dependencies
41
Evaluation of the 3NF Solution D3 for Dependency Preservation
D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage) Is the decomposition D3 attribute preserving? Yes, since the union of all attributes in D3 is exactly the same as the attributes in R Is the decomposition D3 dependency preserving? No. It is impossible to derive fd1: Flight# Origin and fd2: Flight# Destination from D3 i.e., fd1 and fd2 are not preserved in the solution D3 Chapter 8 – Normal Forms Based on Functional Dependencies 41 41
42
Lossless-Join Explained
A decomposition of a relation schema R should be strictly (losslessly) reversible. The term “loss” in “lossless” implies loss of information – not necessarily loss of tuples generation of spurious tuples resulting from the natural join of the projections of R entails loss of information Lossless-join property is always predicated on: a set of FDs (F) prevailing over R the premise that the join attributes in the decomposition are non-null values Lossless Join is also referred to as Non-additive or Non-loss Join
43
Test of Lossless-Join Property
A binary decomposition D: {R1, R2} of a relation schema, R, is a lossless-join decomposition with respect to a set of FDs, F that holds on R, if and only if F+ contains: either the FD (R1 ∩ R2) R1 or the FD (R1 ∩ R2) R2 In other words, if the attribute(s) common to R1 and R2 contain a candidate key of either R1 or R2, then {R1, R2} is a lossless-join decomposition of R
44
Lossless-Join: An Example
FLIGHT is in violation of 3NF F: fd1: Flight# Origin; fd2: Flight# Destination; fd3: {Origin, Destination} Mileage F+: F; fd12: Flight# {Origin, Destination}; fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage} Chapter 8 – Normal Forms Based on Functional Dependencies 44 44
45
Three Possible 3NF Solutions for R
Given R (Flight#, Origin, Destination) and F prevailing over R where F: fd1: Flight# Origin; fd2: Flight# Destination; fd3: {Origin, Destination} Mileage Solutions D1: R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE (Origin, Destination, Mileage) Note: D1 is a dependency-preserving solution D2: R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A (Flight#, Mileage) D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)
46
D1: A 3NF Lossless-Join Solution
D1: R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE (Origin, Destination, Mileage) Note 1: Natural join of R1 and R2 strictly (losslessly) yields the original R Note 1: Natural join of R1 and R2 strictly (losslessly) yields the original R Note 2: D1 is also a dependency-preserving solution Chapter 8 – Normal Forms Based on Functional Dependencies
47
D2: A 3NF Lossless-Join Solution
D2: R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A (Flight#, Mileage) Note 1: Natural join of R1a and R2a strictly (losslessly) yields the original R Note 2: D2, however, is not a dependency-preserving decomposition since {Origin, Destination} Mileage is not preserved in this solution Chapter 8 – Normal Forms Based on Functional Dependencies
48
D3: A 3NF Loss-Join Solution
D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage) Note 1: Natural join of R1b and R2b does not strictly (losslessly) yield the original R Note 2: D3 is not a dependency-preserving decomposition either since Flight {Origin, Destination} is not preserved in this solution Chapter 8 – Normal Forms Based on Functional Dependencies
49
Normal Forms in a Nutshell
1NF 2NF 3NF BCNF Modification Anomalies Yes Yes Yes No (due to FDs) Lossless Join N/A Yes Yes Yes Decomposition Dependency Yes Yes Yes Yes when* Preservation on decomposition * BCNF design that result from 2NF and 3NF resolutions requiring no further normalization.
50
Normalization: Solution Options
BCNF Lossless join Dependency preservation The above is achieved only when a relational schema is in 3NF and there are no BCNF violations in the relational schema, because then the 3NF relational schema is also in BCNF If the above design cannot be achieved, one may have to settle for: Option 2: 3NF Lossless join Dependency preservation Option 1+: {presently, the best option} Option 1 plus a materialized view for each unpreserved FD in the minimal cover of F
51
Materialized View Defined
A Materialized view (also known as a snapshot), despite the similarity in name is not a view. Like a view, it is also derived via the evaluation of a specified relational expression; but, unlike a view a ‘materialized view’ is stored in the database and refreshed on every update in the source relations from where the materialized view is generated i.e., it is maintained current by the DBMS Materialized views are used to: freeze data as of a certain moment without preventing updates to continue on the ‘real’ data Note: Some application (e.g., accounting) often require ‘data’ as of a particular point in time (e.g., end of accounting cycle). sometimes it is desirable to save large amount of data resulting from complex queries from a certain period of time, again, without locking out updates on the source relations.
52
Summary Notes on Normalization
Normalization eliminates data redundancies that cause modification anomalies by appropriately decomposing the target relation schema. Undesirable FDs in the target relation schema are not discarded, but are rendered “desirable” in the decomposed relational schema. There is no specific merit in resolving 2NF violation before 3NF violation – the ordering is only historical. BCNF subsume 2NF and 3NF in that 2NF and 3NF violations are also BCNF violations. Likewise, a relation schema in BCNF is also in 2NF and 3NF. A BCNF solution that also has the lossless join property and is dependency preserving may not be achievable at all times. In this case a lossless join is preferred and Dependency preservation is accomplished via application programs or materialized views Chapter 8 – Normal Forms Based on Functional Dependencies
53
Example Revised Chapter 8 – Normal Forms Based on Functional Dependencies
54
Example Revised Given URS STOCK (Store, Location, Sq_ft, Manager, Product, Price, Quantity, Discount) and F [fd1, fd2, fd3, fd4, fd5, fd6] where fd1: Store Location; fd2: Store Sq_ft fd3: Store Manager fd4: Product Price fd5: {Store, Product} Quantity fd6: Quantity Discount fd1, fd2, fd3, and fd4 violate 2NF and fd6 violates 3NF. Chapter 8 – Normal Forms Based on Functional Dependencies
55
Decomposition of URS STOCK
R1: STORE_LOC (Store, Location); R2: STORE_SIZE (Store, Sq_ft); R3: STORE_MGR (Store, Manager); R4: PRODUCT (Product, Price); R5: INVENTORY (Store, Product, Quantity, Discount) R5 violates 3NF therefore… R5a: DISC_STRUCTURE (Quantity, Discount); R5b: INVENTORY (Store, Product, Quantity) Chapter 8 – Normal Forms Based on Functional Dependencies
56
A Parsimonious Consolidation of the Decomposition
R123: STORE (Store, Location, Sq_ft, Manager) R4: PRODUCT (Product, Price); R5a: DISC_STRUCTURE (Quantity, Discount); R5b: INVENTORY (Store, Product, Quantity) Chapter 8 – Normal Forms Based on Functional Dependencies
57
Denormalization Entails combining relations to enhance query efficiency Reintroduces data redundancies eliminated by normalization General misunderstanding: Denormalization always improves data retrieval performance Formal definition: Replacing a set of often normalized relation schemas D {R1, R2, Rn} by their join Rsuch that projection R over the set of attributes of R1, R2, Rn respectively is guaranteed to yield the original set D.
58
Review: Keys Primary key Candidate key Superkey Foreign key
Alternate key Surrogate key Partial Key Chapter 8 – Normal Forms Based on Functional Dependencies
59
A Normalization Exercise
Given F [fd1, fd2, fd3, fd4, fd5, fd6, fd7, fd8] that prevails over URS1 where fd1: {Store, Branch} Location; fd2: Customer Address; fd3: Vendor Product; fd4: {Store, Branch} Sq_ft; fd5: Product Price; fd6: {Store, Branch} Manager; fd7: Manager {Store, Branch}; fd8: Store Type Chapter 8 – Normal Forms Based on Functional Dependencies
60
Normalization Step 1: Identify the candidate keys of URS1, given F
(Store, Branch, Customer, Vendor) and (Manager, Customer, Vendor) Step 2: Choose a primary key (Manager, Customer, Vendor) Step 3: Identify violations of normal forms for every FD in F Step 4: Resolve 2NF and 3NF violations and decompose Step 5: Resolve BCNF violations and decompose Chapter 8 – Normal Forms Based on Functional Dependencies
61
Normalization (continued)
Step 6: Is the decomposition attribute preserving? Is the decomposition dependency preserving? Does the decomposition exhibit lossless-join property? Cross-check the decomposition! Chapter 8 – Normal Forms Based on Functional Dependencies
62
Reverse Engineering Common practice in hardware development intended to ‘discover’ how a particular product works Software engineering, over decades, has been preoccupied with developing “clearly understood” new systems A major concern of practitioners has been upgrading “poorly understood” old systems The criticality of the issue acknowledged only in the 1990s (CACM, May 1994) A working Definition of reverse enginering: working backward in a systematic manner Goal: Understand how existing software works in a system development environment
63
Reverse Engineering in a Database Environment
Precursor to migration from legacy systems Shed light on poorly documented (often, operational) database systems In Data modeling: Reverse engineering within the forward engineering process To understand if/how the original (conceptual) schema is flawed so that occurrence of such flaws can be preemptively stopped Reverse engineer a normalized relational schema to discover conceptual modeling errors
64
Heuristic to Reverse Engineer a Relational Schema to ER Diagram
Step 1 Translate the normalized relational schema to an information-preserving logical schema based on available information Step 2 Transform the information-preserving logical schema to a design-specific ER diagram – either coarse or fine granularity or a hybrid thereof depending on the available information Step 3 Abstract the design-specific ER diagram up to a presentation layer ERD
65
Reverse Engineering a Relation Schema Step 1
Denote alternate keys as unique [Q]; Enclose composite attributes in braces [ ] Indicate the parent label on top of the foreign key along with the parent side structural constraints of the relationship type default value for min and max when information not available are 0 an n respectively (see Section of chapter 6 for grammar) Indicate child side structural constraints of the relationship type right below the foreign key along with the deletion rule default value for min when information not available is 0 and default deletion rule is R except when the cardinality ratio is 1:1; then, leave unfilled if rule not specified (see Section of chapter 6 for grammar)
66
Reverse Engineering a Relation Schema Step 2
a. Map each logical scheme to an entity type b. Represent the foreign key attribute(s) in a logical scheme by a relationship type in the ERD Note: The foreign key attribute is removed from the referencing entity type (Child) unless the attribute plays some other role also in the entity type c. Map attributes of individual logical schemes to corresponding entity types in the ERD
67
Step 2a Map each logical scheme to an entity type
If the primary key of a logical scheme Lx is a proper subset of the primary key of another logical scheme Ly, map Ly as a weak entity type in the ERD with Lx as its identifying parent If the primary key of a logical scheme Lx is a concatenation of the primary keys of multiple logical schemes Ly, Lz, etc., map Lx as a gerund entity type in the ERD with Ly, Lz, etc. as its identifying parents All other logical schemes are mapped as base entity types
68
Step 2b Map the foreign key attribute(s) to a relationship type
Establish the relationship type connect the relationship type to the parent (referenced) and child (referencing) entity types if the child is a weak entity type then the relationship type is mapped as an identifying relationship type Map the (min, max) to the appropriate edge of the relationship type
69
Step 2c Map attributes to entity types in the ERD
Map atomic and composite attributes Underline unique identifiers (The primary key and alternate keys are unique identifiers of the corresponding entity type) Partial key of a weak entity is denoted by a dotted underline
70
Reverse Engineering a Relation Schema Step 3
Transform gerund entity types to n-way relationship types. Attributes of the gerund entity type remain as attributes of the relationship type Any relationship with a gerund entity type is transformed to a relationship type with the cluster entity type that the gerund represents A weak entity type not participating in any relationship other than the identifying relationship is transformed to a multi-valued (atomic/composite) attribute of the parent entity type
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.