Normalizing Database Files Professor Ralph Westfall May, 2011
Concepts Covered Using SQL Server Management Studio to take unnormalized data in a file and splitting it into two normalized files For example, repeated data in a field that has a one to many or many to many relationship with the other fields in the table Doing the same thing with Excel
Create Non-Redundant Table CREATE TABLE currenciesCKey ( Currency VARCHAR(15)) INSERT INTO currencies (Currency) SELECT DISTINCT Currency from nations Word DISTINCT gets rid of duplicated entries in the normalized new table
Finish Creating New Table Click Execute button Click in Object Explorer window, then click Refresh icon and expand Tables node for this database to see the new table May need to right-click Tables>Refresh for the database used to see table that was just created
Add Normalized New Table Key Right-click this new table name>Design Right-click top field>Insert Column Change Data Type to int (or something else appropriate) Right-click this new column>Set Primary Key Set (Is Identity) to Yes in Column Properties Set Identity Seed to a value large enough so all current and expected rows will have same number of digits in their key Click diskette icon to save this new table
Create Table with Foreign Key Define a new table that has a foreign key to reference the data in the second table Do a join that includes all fields from both tables EXCEPT for the field that went into the separate normalized table the key field from the second table gets added to the new table but the data the key field references doesn't go into it Results of this join fill the new table with data
Create Table with Foreign Key CREATE TABLE [table] ([field [datatype], [field] [datatype], etc.) INSERT INTO [table] ([field], [field], [field],[table].[field], etc.) WHERE [table].[field]=[other_table].[field] Will have all fields (excluding duplicates) in the new table Will need to qualify fields with same names in both tables by name of the table they come from Execute, then Refresh to see the new table See sample code in Notes
Verify Data Use SQL Server Management Studio to Select Top 1000 Rows to see data in new table visually check foreign keys against values in other table Or run a SQL query to join and view the data
Backup Backup data from the original table Right-click database name>Tasks>Export Data>Next Data source is SQL Server Native Client 10.0>Use Windows Authentication>Database is the one that was being used>Next Create a new.txt file on desktop>Destination is Flat File Destination>Browse for that file>Format: Delimited>Next Copy data from one or more etc.>Next Source table or view: [original table]>Next>Next> Finish
Finish Backup and Delete Verify that the new table has the same number of rows as the original table SELECT COUNT([key_field]) FROM [original_table] and then repeat with new table Delete the original table Detach the database
Excel Could easily normalize small and middle sized data sets in Excel See Filter for unique values or remove duplicate values for how to get unique values for second tableFilter for unique values or remove duplicate values Copy the column to be normalized into a new table add sequence of numbers in column to the left put captions over each of these two columns follow instructions in the link above
Excel - 2 Add an extra column to the main table create a VLOOKUP column on the second table you just created that converts the data in the un- normalized column in the main table to the key value use Paste Special to replace the actual with key values save both tables as.csv files to load into a database