Download presentation
Presentation is loading. Please wait.
1
Should This Be Normalized?
When Database Normalization Seems Abnormal
2
About Me Professional side Personal side
Data modeler/architect at Community Care of North Carolina Worked with SQL Server for 8 years (started with 2008 R2) Started as an web/data analyst and QA person, then a database developer, then shifted between analysis and architecture since Personal side From Raleigh via Philadelphia Avid runner (2x marathoner, age group Cary Pancakes & Beer 5k) Autism spectrum advocate Lover of obscure pop culture references About Me LinkedIn:
3
What is this about? Normalization vs. denormalization
Primer on the normal forms and how they work First, second/third, Boyce-Codd A forum on when normalization actually works in a BI context Audience participation! Hint: all questions will ultimately have the same answer What is this about?
4
A definition What is normalization anyway?
5
The structuring of a relational database to increase integrity and reduce redundancy
Concept introduced by Edgar F. Codd in 1970 while working on data storage Involves facts and dimensions to look up transactions and references Normalization
6
Normalization The Advantages The Disadvantages
Less duplication means database size is smaller In many cases, the first point leads to data models optimized for applications & products Only need to join necessary tables when querying New data can easily be inserted Many fact tables may contain codes upon codes, so frequent joins to lookup tables are needed As the types progress and dimensions increase, performance will be affected What about all those aggregates?
7
Normal forms First, second, third, and Boyce-Codd
9
We have a limited set of race data in a file
We have a limited set of race data in a file. A string of race participants is included with each event instance. If we are going to process future results, we’ll have to see what works with our current system so the runners won’t complain about seeing how they did. Let’s look through the types. When should this be normalized? The Problem
10
First Normal Form “The key”
Elimination of repeating groups and columns No two rows are identical The records have the same number of fields Use the one-to-many relationship to develop without multiple columns First Normal Form
11
Second Normal Form “The whole key”
Everything from First Normal Form still applies Duplicate data sets are removed Determinants are based on the primary key Cardinality reduction Second Normal Form
12
Third Normal Form “Nothing but the key”
Everything from first and second normal forms apply Essentially an extension of second normal form Figuring out if a determinant is not an entity If A relates to C, C cannot determine B Applies best for prototypes in a BI environment Third Normal Form
13
Boyce-Codd Normal Form
Now the transitive dependencies are gone Every row has a unique identity If A determines B, it’s because A is a key! You can usually go straight from first to BCNF by looking at determinants Race: RaceName, RaceState Distance: DistanceCode, RaceDistance Sponsor: SponsorCo Participant: ParticipantName, ParticipantAddress, ParticipantCity, ParticipantState, ParticipantZip Candidate key: ChipTime (RaceID, ParticipantID, DistanceID) Boyce-Codd Normal Form
14
Time to ask the question…
Should This Be Normalized? Time to ask the question…
15
Why denormalize? The Advantages The Disadvantages
Reporting environments often require great performance for frequent pulls Some calculations can be readily applied Analytics and data science teams may have an easier time connecting variables The three types of write anomalies are included If more write operations are included, everything could actually take longer Do we know all the rules or do we need to document more?
16
Further use cases A forum on (de)normalization, where we run through scenarios
17
A free text field includes city and state and whether the address is permanent
This allows for tracking business geography Should this be normalized? For applications? Reporting? What should we consider? Abbreviated city names Reporting on the phone number If a house is on the census Address in the box
18
You have a table with phone numbers, split into area code, and first 3 then 4 digits
The audience is customer service, directly accessing the database through an application Should this be normalized? The phone number Country Code Area Office Prefix Line Number 1 215 834 5858 972 976 0227 44 0114 807 6591 305 117 7076
19
A customer CRM has history of patient transactions with previous names and addresses included
The PowerBI gurus want to use this for a model on turnover Does denormalization apply here? What should we consider? Access to PHI data Storage space Scalability Partitions Customer history
20
The cardinality makes a difference
Inverse relationship to normalization Preferences for simple star schemas The context of normalization for “Power” models Do you want to normalize dates? Numbers? Experimental models are concerned more with the rows than the columns [obligatory slide about Tabular, Power Pivot, Power View, and Power BI]
21
IT DEPENDS. It’s all about the entity’s data plan
22
More Questions and Answers?
23
Thanks for coming! Ceedubvoss.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.