Systems Analysis & Design Methods III Classic normalization rules for relational databases III Classic normalization rules for relational databases
2 Systems Analysis III Normalization Rules Contents Introduction Introduction First Normal Form: Columns should not repeat First Normal Form: Columns should not repeat Second Normal Form: Non-key columns should depend on the whole primary key, not just on a part. Second Normal Form: Non-key columns should depend on the whole primary key, not just on a part. Third Normal Form: Non-key should not depend on other non-key columns. Third Normal Form: Non-key should not depend on other non-key columns. Note: Excellent reading:
3 Systems Analysis III Normalization Rules Introduction By using normalization rules, we try to By using normalization rules, we try to avoid inconsistencies avoid inconsistencies avoid waste of space. avoid waste of space. enhance flexible use of data (easy SQL queries). enhance flexible use of data (easy SQL queries). minimize the effect of application changes on the database structure. minimize the effect of application changes on the database structure. The normalization rules are to be read accumulatively: The normalization rules are to be read accumulatively: E.g. Your database is in 3NF if it is compliant with the rules given by 1NF, 2NF and 3NF
4 Systems Analysis III Normalization Rules Some vocabulary candidate key = a combination of columns which uniquely determine a table row candidate key = a combination of columns which uniquely determine a table row the primary key = a chosen minimal combination of columns which uniquely determine a table row the primary key = a chosen minimal combination of columns which uniquely determine a table row alternate key = candidate key not chosen as primary key alternate key = candidate key not chosen as primary key foreign key = primary key of another table. Is used to reference a specific row in this other table foreign key = primary key of another table. Is used to reference a specific row in this other table With non-key I mean a key that is not candidate, primary or alternative key With non-key I mean a key that is not candidate, primary or alternative key
5 Systems Analysis III Normalization Rules First Normal Form Columns should not repeat This means that you are not allowed to try and store an array or a collection of the same kind of information, in one table row. This attempt can take two forms which result in two subrules: This means that you are not allowed to try and store an array or a collection of the same kind of information, in one table row. This attempt can take two forms which result in two subrules: You cannot have several columns, having similar information: You cannot have several columns, having similar information: 3 columns child1, child2, child3 (see also next slide) 3 columns child1, child2, child3 (see also next slide) Nor can you put multiple values in one column: Nor can you put multiple values in one column: 1 column children which contains a string of concatenated first names like ‘David-Ben-Joe’ 1 column children which contains a string of concatenated first names like ‘David-Ben-Joe’
6 Systems Analysis III Normalization Rules First Normal Form Columns should not repeat Violation Example: Violation Example: Appl_idAppl_NameRefphone1Refphone2 1237Smithers Simpson
7 Systems Analysis III Normalization Rules First Normal Form Columns should not repeat Problems when violating 1NF: Problems when violating 1NF: Every time more repeated fields are needed, the structure of the table changes, and rewriting of existing code/queries is needed. Every time more repeated fields are needed, the structure of the table changes, and rewriting of existing code/queries is needed. Explicit naming of different columns necessary when quering of programming. Explicit naming of different columns necessary when quering of programming. Rows who do not need many contacts waste space. Rows who do not need many contacts waste space.
8 Systems Analysis III Normalization Rules First Normal Form Columns should not repeat Solution: Solution: Appl_idAppl_Name 1237Smithers 1238Simpson Cand_idRefPhone
9 Systems Analysis III Normalization Rules Second Normal Form Non-key columns should depend on the whole primary key, not just on a part. A field y depends on a field x, if there is only one possible value for y, given a value for x. E.g. in the next slide: The applicant table on top of the slide has a primary key Appl_id. The applicant/reference table below tells you wich applicants have which references. The primary key is Appl_id + Refphone Within de applicant/reference table we can say this: When you know the value of the Appl_id column, you know which Appl_name goes with it. Clearly, the Appl_Name field is completely dependent on Appl_id. That’s is why we say the database violates 2NF.
10 Systems Analysis III Normalization Rules Second Normal Form Non-key columns should depend on the whole primary key, not just on a part. Violation Example Violation Example Appl_idAppl_Name 1237Smithers 1238Simpson Appl_idAppl_NameRefPhone1237Smithers Smithers Simpson Simpson
11 Systems Analysis III Normalization Rules Second Normal Form Non-key columns should depend on the whole primary key, not just on a part. Problems when violating 2NF: Problems when violating 2NF: The part of the primary key, on which the column is dependent, is mostly a foreign key. The part of the primary key, on which the column is dependent, is mostly a foreign key. This means that the dependent column contains information that is probably already available in the record (in another table) to which this foreign key points. So the dependend column contains copied information (from another table) that needs to be kept consistent with the original. (Danger for anomalies.)
12 Systems Analysis III Normalization Rules Second Normal Form Non-key columns should depend on the whole primary key, not just on a part. The part of the primary key, on which the column is dependent, probably contains the same values for different rows. This means that the dependent column contains the same values for these same rows. So the dependend column contains copied information (from the same table) that needs to be kept consistent with the original. The part of the primary key, on which the column is dependent, probably contains the same values for different rows. This means that the dependent column contains the same values for these same rows. So the dependend column contains copied information (from the same table) that needs to be kept consistent with the original. Copying (see above) information is a waste of space. Copying (see above) information is a waste of space.
13 Systems Analysis III Normalization Rules Second Normal Form Non-key columns should depend on the whole primary key, not just on a part. Solution: Solution: Appl_idAppl_Name 1237Smithers 1238Simpson Appl_idRefPhone
14 Systems Analysis III Normalization Rules Third Normal Form Non-key columns should not depend on other non-key columns. Violation Example: Violation Example: Client_idClient_NameZipCity 1SmithB-1000Brussels 2JonesB-2000Antwerp 3VacarelloB-2000Antwerp 4PetersB-3000Leuven
15 Systems Analysis III Normalization Rules Third Normal Form Non-key columns should not depend on other non-key columns. Problems when violating 3NF: Problems when violating 3NF: The dependent column contains information that is also available in other rows from the same table. So the dependend column contains copied information that needs to be kept consistent with the original. (Danger for update anomalies.) The dependent column contains information that is also available in other rows from the same table. So the dependend column contains copied information that needs to be kept consistent with the original. (Danger for update anomalies.) Copied information wastes space. Copied information wastes space.
16 Systems Analysis III Normalization Rules Third Normal Form Non-key columns should not depend on other non-key columns. Solution: Solution: Client_idClient_NameZip 1SmithB JonesB VacarelloB PetersB-3000 ZipCityB-1000Brussels B-2000Antwerp B-3000Leuven
17 Systems Analysis III Normalization Rules Third Normal Form Remarks Columns may be dependent on alternative keys. So they may be dependent on a field or combination of fields that uniquely define a record, but that was not chosen as the primary key. Columns may be dependent on alternative keys. So they may be dependent on a field or combination of fields that uniquely define a record, but that was not chosen as the primary key. E.g. On the next slide Appl_Name is dependent on the alternate key. This is okay since an alternate key uniquely defines the whole record. (In other words, we could just as well have chosen this alternate key as the primary key)
18 Systems Analysis III Normalization Rules Third Normal Form Remarks Appl_id Social_Security_Number Social_Security_NumberAppl_Name Smithers Simpson alterate key primary key