Presentation is loading. Please wait.

Presentation is loading. Please wait.

Should This Be Normalized?

Similar presentations


Presentation on theme: "Should This Be Normalized?"— Presentation transcript:

1 Should This Be Normalized?
When Database Normalization Seems Abnormal

2 About Me, Chris Voss Professional side Personal side
SQL/Database Developer at Fresenius Medical Care Worked with SQL Server for 9 years (started with 2008 R2) Started as web/data analyst and QA person, then a BI developer, then shifted between analysis and architecture since Personal side From Raleigh via Philadelphia Avid runner (2x marathoner, nd place age group winner @ Nags Head Woods 5k) Autism spectrum advocate (Autism Society of NC Human Rights Committee) Lover of obscure pop culture references About Me, Chris Voss LinkedIn:

3 What is this about? Normalization vs. denormalization
About the forms, and how they work First, second, third…Boyce-Codd if you’re lucky A forum on when normalization actually works, versus if it doesn’t Audience participation! Hint: all questions will ultimately have the same answer What is this about?

4 A definition What is normalization anyway?

5 The structuring of a relational database to increase integrity and reduce redundancy
Concept introduced by Edgar F. Codd in 1970 while working on data storage Involves facts and dimensions to look up transactions and references Normalization

6 What Makes Normalization Good
Less duplication means database size is smaller In many cases, the first point leads to data models optimized for applications & products Only need to join necessary tables when querying New data can easily be inserted with integrity What Makes Normalization Good

7 Normal forms First, second, third, and Boyce-Codd

8 First Normal Form “The key”
Elimination of repeating groups and columns No two rows are identical Use of composite keys Primary key to identify Use the one-to-many relationship to develop without multiple columns First Normal Form

9 1NF RacerID Last Name First Name Gender City State Event Race Type 1
Johnson Jordan F Waltham MA Patriot 5k 5k 2 Grant Dennis M Easton PA Boston Marathon Marathon 3 Denise 4 Cooper River Bridge Run 10k 5 Quincy Sarah Manchester NH 6 Fenton Mitchell Raleigh NC Oktoberfest 4 Miler 4 Miler 7 Durham 8 Cape Dash Run

10 Second Normal Form “The whole key”
Everything from First Normal Form still applies Single column primary keys Duplicate data sets are removed Determinants are based on the primary key Cardinality reduction Second Normal Form

11 2NF RacerID Last Name First Name Gender City State 1 Johnson Jordan F
Waltham MA 2 Grant Dennis M Easton PA 3 Denise 4 Quincy Sarah Manchester NH 5 Fenton Mitchell Raleigh NC 6 Durham EventID Event Race Type 1 Boston Marathon Marathon 2 Cooper River Bridge Run 10k 3 Patriot 5k 5k 4 Oktoberfest 4 Miler 4 Miler 5 Cape Dash Run

12 Third Normal Form “Nothing but the key”
Everything from first and second normal forms apply Essentially an extension of second normal form Figuring out if a determinant is not an entity If A relates to C, C cannot determine B Third Normal Form

13 3NF RacerID Last Name First Name Gender LocationID 1 Johnson Jordan F
2 Grant Dennis M 3 Denise 4 Quincy Sarah 5 Fenton Mitchell 6 3NF EventID Event Race Type 1 Boston Marathon Marathon 2 Cooper River Bridge Run 10k 3 Patriot 5k 5k 4 Oktoberfest 4 Miler 4 Miler 5 Cape Dash Run LocationID City State 1 Waltham MA 2 Easton PA 3 Manchester NH 4 Raleigh NC 5 Durham

14 Boyce-Codd Normal Form
Now the transitive dependencies are gone Every row has a unique identity If A determines B, it’s because A is a super key You can usually go straight from first to BCNF by looking at determinants Race: RaceName, RaceState Distance: DistanceCode, RaceDistance Participant: ParticipantName, ParticipantAddress, ParticipantCity, ParticipantState, ParticipantZip Boyce-Codd Normal Form

15 That is normalization, but our question is…
Should This Be Normalized? That is normalization, but our question is…

16 When Normalization Fails
JOIN A…JOIN B…JOIN C Too many joins to slow performance Every table with the varying data types Too many dimensions to slow performance What about all those aggregates? When Normalization Fails

17 So, we should denormalize?
Let’s think about it.

18 Why denormalize? The Advantages The Disadvantages
Reporting environments often require great performance for frequent pulls Some calculations can be readily applied Analytics and data science teams may have an easier time connecting variables The three types of write anomalies are included If more write operations are included, everything could actually take longer Do we know all the rules or do we need to document more? Write anomalies where a duplicate of

19 Has anyone tried to make a view out of the information they needed?
How many ways can you get to a single fact? Are there derivable values that can be stored? Can you collapse the records into one table? Are there more possibilities for indexes? Ways to denormalize Index can be more useful

20 Further use cases A forum on (de)normalization, where we run through scenarios

21 An address table is populated via trigger when members are inserted
Table only has a primary key clustered index, and gets complaints for slowness Should this be normalized? For applications? Reporting? What should we consider? How many addresses per member Other member demographics How often is this table being called Address in the box

22 You have a table with phone numbers, split into area code, and first 3 then 4 digits
The audience is customer service, directly accessing the database through an application Should this be normalized? The phone number Country Code Area Office Prefix Line Number 1 215 834 5858 972 976 0227 44 0114 807 6591 305 117 7076

23 A Garmin tracker has history of race results with previous names and addresses included
The PowerBI gurus want to use this for a model on performance across different events Does denormalization apply here? What should we consider? Computing race paces Storage space Scalability of results Partitions Customer history

24 If you split transactional from analytical, the former can usually stay normalized
Bad performance is often easier to fix than bad data If you can get to a fact, then you can get to the same fact using a different method, you have a redundancy Key Factors

25 IT DEPENDS. Normalize until it hurts Denormalize until it works

26 More Questions and Answers?

27 Thanks for coming! Ceedubvoss.com


Download ppt "Should This Be Normalized?"

Similar presentations


Ads by Google