Should This Be Normalized?

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Using MIS 4e Chapter 5 Database Processing
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 4 This material was developed by Oregon Health & Science.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
CORE 2: Information systems and Databases NORMALISING DATABASES.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Highline Class, BI 348 Basic Business Analytics using Excel Introduction to PowerPivot 1.
Lection №4 Development of the Relational Databases.
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
MIS2502: Data Analytics Relational Data Modeling
Chapter 4 Logical & Physical Database Design
Postgresql East Philadelphia, PA Databases – A Historical Perspective.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
FEN Introduction to the database field: The development process Seminar: Introduction to relational databases Development process: Analyse.
INFS 6220 Systems Analysis & Design Transactional DBs vs. Data Warehouses.
Logical Database Design and Relational Data Model Muhammad Nasir
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Just Enough Database Theory for Power Pivot / Power BI
Understanding Data Storage
Revised: 2 April 2004 Fred Swartz
Power BI Performance Tips & Tricks
GO! with Microsoft Office 2016
Databases Chapter 16.
The Relational Model and Database Normalization
Normalization Karolina muszyńska
MIS2502: Data Analytics Relational Data Modeling
A brief summary of database normalization
© 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
© The McGraw-Hill Companies, All Rights Reserved APPENDIX C DESIGNING DATABASES APPENDIX C DESIGNING DATABASES.
DESIGNING DATABASE APPLICATIONS
INFORMATION TECHNOLOGY – INT211
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
Physical Database Design and Performance
GO! with Microsoft Access 2016
Chapter 4 Relational Databases
Payroll Management System
MIS5101: Business Intelligence Relational Data Modeling
Applying Data Warehouse Techniques
Database Normalization
SQL 101.
Data Modelling Introduction
Should This Be Normalized?
Normalization Referential Integrity
Announcements Project 2’s due date is moved to Tuesday 8/3/04
Normalization By Jason Park Fall 2005 CS157A.
INFS 3220 Systems Analysis & Design
Relational Database Model
MIS2502: Data Analytics Relational Data Modeling
Normalization.
Database Design Agenda
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
Relational Database Design
Chapter 17 Designing Databases
Applying Data Warehouse Techniques
Normalization By Jason Park Fall 2005 CS157A.
Applying Data Warehouse Techniques
Database Normalization
Should This Be Normalized?
Presentation transcript:

Should This Be Normalized? When Database Normalization Seems Abnormal

About Me Professional side Personal side Data modeler/architect at Community Care of North Carolina Worked with SQL Server for 8 years (started with 2008 R2) Started as an web/data analyst and QA person, then a database developer, then shifted between analysis and architecture since Personal side From Raleigh via Philadelphia Avid runner (2x marathoner, 2018 30-34 age group winner @ Cary Pancakes & Beer 5k) Autism spectrum advocate Lover of obscure pop culture references About Me Twitter: @ceedubvee LinkedIn: www.linkedin.com/in/cwvoss

What is this about? Normalization vs. denormalization Primer on the normal forms and how they work First, second/third, Boyce-Codd A forum on when normalization actually works in a BI context Audience participation! Hint: all questions will ultimately have the same answer What is this about?

A definition What is normalization anyway?

The structuring of a relational database to increase integrity and reduce redundancy Concept introduced by Edgar F. Codd in 1970 while working on data storage Involves facts and dimensions to look up transactions and references Normalization

Normalization The Advantages The Disadvantages Less duplication means database size is smaller In many cases, the first point leads to data models optimized for applications & products Only need to join necessary tables when querying New data can easily be inserted Many fact tables may contain codes upon codes, so frequent joins to lookup tables are needed As the types progress and dimensions increase, performance will be affected What about all those aggregates?

Normal forms First, second, third, and Boyce-Codd

We have a limited set of race data in a file We have a limited set of race data in a file. A string of race participants is included with each event instance. If we are going to process future results, we’ll have to see what works with our current system so the runners won’t complain about seeing how they did. Let’s look through the types. When should this be normalized? The Problem

First Normal Form “The key” Elimination of repeating groups and columns No two rows are identical The records have the same number of fields Use the one-to-many relationship to develop without multiple columns First Normal Form

Second Normal Form “The whole key” Everything from First Normal Form still applies Duplicate data sets are removed Determinants are based on the primary key Cardinality reduction Second Normal Form

Third Normal Form “Nothing but the key” Everything from first and second normal forms apply Essentially an extension of second normal form Figuring out if a determinant is not an entity If A relates to C, C cannot determine B Applies best for prototypes in a BI environment Third Normal Form

Boyce-Codd Normal Form Now the transitive dependencies are gone Every row has a unique identity If A determines B, it’s because A is a key! You can usually go straight from first to BCNF by looking at determinants Race: RaceName, RaceState Distance: DistanceCode, RaceDistance Sponsor: SponsorCo Participant: ParticipantName, ParticipantAddress, ParticipantCity, ParticipantState, ParticipantZip Candidate key: ChipTime (RaceID, ParticipantID, DistanceID) Boyce-Codd Normal Form

Time to ask the question… Should This Be Normalized? Time to ask the question…

Why denormalize? The Advantages The Disadvantages Reporting environments often require great performance for frequent pulls Some calculations can be readily applied Analytics and data science teams may have an easier time connecting variables The three types of write anomalies are included If more write operations are included, everything could actually take longer Do we know all the rules or do we need to document more?

Further use cases A forum on (de)normalization, where we run through scenarios

A free text field includes city and state and whether the address is permanent This allows for tracking business geography Should this be normalized? For applications? Reporting? What should we consider? Abbreviated city names Reporting on the phone number If a house is on the census Address in the box

You have a table with phone numbers, split into area code, and first 3 then 4 digits The audience is customer service, directly accessing the database through an application Should this be normalized? The phone number Country Code Area Office Prefix Line Number 1 215 834 5858 972 976 0227 44 0114 807 6591 305 117 7076

A customer CRM has history of patient transactions with previous names and addresses included The PowerBI gurus want to use this for a model on turnover Does denormalization apply here? What should we consider? Access to PHI data Storage space Scalability Partitions Customer history

The cardinality makes a difference Inverse relationship to normalization Preferences for simple star schemas The context of normalization for “Power” models Do you want to normalize dates? Numbers? Experimental models are concerned more with the rows than the columns [obligatory slide about Tabular, Power Pivot, Power View, and Power BI]

IT DEPENDS. It’s all about the entity’s data plan

More Questions and Answers?

Thanks for coming! Ceedubvoss.com Twitter: @ceedubvee