INNOVATIVE DATA MODELING Make Data Warehousing Cool Again

Slides:



Advertisements
Similar presentations
Can you identify these NFL teams?
Advertisements

NFL Franchise Activity Push/Pull Factors NFL Franchises Akron Professionals Buffalo All-Americans Canton Bulldogs Chicago Cardinals Chicago Tigers.
Current Research and Marketing suggests that Synthetic Turf performs comparably to Natural Turf  Footing  Minor Injuries  Major Injuries  Hardness.
Professional Sports Salary Caps For Better. General Salary Cap- A set spending limit that a sports team is allowed to spend on their athletes each year.
Friday the 25th of June 2004ENST Paris - english lesson1 american football rules equipments games leagues, conferences, divisions –in the US of A –in France.
Copyright © 2013 Varigence, Inc. Auto-generate a Data Vault Series Peter Avenant and Michael Buller
FOOTBALL QUIZ HOW MUCH DO YOU KNOW ABOUT PRO FOOTBALL?!
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition
Manny Caldwell Student Project JAWS 2/25/15 AMERICAN FOOTBALL.
NFL Football Mark: The History of the NFL andSherrie: How the game is played, the players, and teams.
Super Schedule Analysis USING THE SUPER RANKINGS TO ANALYZE YOUR FANTASY SCHEDULE MAT HARRISON
By: Lauren Thompson Thompson1. [Arizona Cardinal 2013//Angry Birds Are Back] YouTube. Retrieved on 2/19/2014 from
American football. Navigation IntroWhat is NFLWho is the audience American football conferenceTop 5 teams.
The Creation of a Team Name or Mascot When Geographics, Psychographics, And Business Collide.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 5 Advanced Data Modeling.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Can You Name…. Snow White’s Seven Dwarves DopeyGrumpyDocBashfulSleepySneezy and Happy.
Sports Marketing Lesson 2- Branding “Team Nicknames”
FANTASY SPORTS “FOOTBALL” Learning Target: I can define “Fantasy Sports” I can apply the rules of the FFFL by competing in the FFFL competition without.
Wednesday, August 29 Remember that your “All About Me” worksheet is due Friday 1.Turn in your parent letter on my desk if you did not yesterday. 2. Get.
FANTASY SPORTS “FOOTBALL” Learning Target: I can define “Fantasy Sports” I can apply the rules of the FFFL by competing in the FFFL competition without.
By : Patrick Maxwell Salary Cap. A salary cap is the limited amount of money that teams can spend on players contracts. This helps maintain the competitive.
College Football 11/13/2009 5:00pm 115West Virginia (-110) Ov 56 (-110) 116Cincinnati (-110) Un 56 (-110) Please come and make a bet on the.
Money V Success in Sport Exercise on Correlation.
Chapter 8 Data Modeling Advanced Concepts Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
N.F.C. teams : Dallas Cowboys Washington Redskins New York Giants Philadelphia Eagles Minnesota Vikings Green Bay Packers Chicago Bears Detroit Lions.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
Actuaries in Financial Markets (Session C2) CAS Annual Meeting November 2007 Chicago, IL Scott J. Swanay, FCAS Swanay Sports FantasyBaseballSherpa.com.
Football originated during the 1823 (it was called rugby) in England. Rugby was then introduced to the North Americans by the British army. It was introduced.
T BY: Kobe Logan. The 32 NFL Teams Bears Bears Team 1.
By : Patrick Maxwell Salary Cap. A salary cap is the limited amount of money that teams can spend on players contracts. This helps maintain the competitive.
2016 Mock Draft: Edition 5 By Ryan Dunbar. 1. Tennessee Titans – OT Laremy Tunsil.
2016 Mock Draft: Edition 2 (3 rounds) By Ryan Dunbar.
I CAN… Make an arguable claim that can be supported with evidence from the text. “How to” 1.Read the passage/analyze the graph. 2.Determine what its main.
The NFL is split into two CONFERENCES. And each conference is broken down into four DIVISIONS. American Football Conference (AFC) National Football Conference.
Introduction Data Vault. Historical development Business Intelligence 1950 Turing : First computers 1960Codd : 3NF 1970Management Information Systems.
Aaron Rodgers Clay Matthews. Who started the Packers? Curly Lambeau and George Whitney started the Green Bay Packers in August 11, The Indian Packaging.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Nate’s Football Pool WEEK 1 (1-16) ___ – SEA ___ – ATL ___ – STL ___ – PITT ___ – PHIL ___ – NYJ ___ –
NFL Quarterback Salary and Passer Rating
Madden 18 JeDarrik (J.D.) McGhee.
Getting started with Accurately Storing Data
Name That Team™ Classroom Challenge
Greenbay Packers Official Site
Rodney Fort's Sports Economics
FOOTBALL This is an example text. Go ahead an replace it with your own text. This is an example text END ZONE.
Exploration and Settlement until 1675
Fundamentals of Information Systems, Sixth Edition
The Super Bowl The Super Bowl takes place from
Super Bowl Countdown XLVII - i.
Lecture 2 The Relational Model
Rodney Fort's Sports Economics
Rodney Fort's Sports Economics
Unit 5: Competitive Balance
ERD’s REVIEW DBS201.
Introduction to Data Vault on SQL Server
Database Fundamentals
Predicting NFL Game Outcomes: Back-Propagating MLP
Unit 5: Competitive Balance
Teaching slides Chapter 8.
Unit 3 Lesson 5: Regional Cities
Introduction to Data Vault
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Unit 5: Competitive Balance
Chapter 5 Advanced Data Modeling
Example Project Presentation
Chapter 17 Designing Databases
Data Warehousing Concepts
Presentation transcript:

INNOVATIVE DATA MODELING Make Data Warehousing Cool Again Leslie Weed, Architect, RevGen Partners INNOVATIVE DATA MODELING Make Data Warehousing Cool Again

Leslie Weed Architect, RevGen Partners All those Data things Colorado While starting as an app developer 20 years ago I quickly navigated to the data space and have enjoyed every minute of it. Colorado Leslie Weed Love living right next to some of the best parts of the Rocky Mountains. Enjoying both sun and snow. Architect, RevGen Partners Data Modeling is Fun /leslieweedsql The best part of the job – organizing data and helping others organize their data for great performance and usage. @weederbug

Keyed Instance and Reference Tables Hello Data Vault Tables and Rules Business Keys Raw and Business Layer Keyed Instance and Reference Tables Leslie

Problems in Data Warehouses Takes too long to build Once it is up then it is hard to add or modify It simply hasn’t been maintained and is outdated No History/Archive/Storage Plan No well defined usage (datamart vs views vs tabular vs reporting) There has got to be a better way Leslie

Hello Data Vault

Ensemble Patterns Ensemble Focal Point Anchor Data Vault Your Style DV 2.0 Hyper Agility Temporal Leslie Image concept from http://geneseeacademy.com/

Enterprise Data Warehouse Sources Stage Data Marts STAGE Data Warehouse Raw BDV EDW Cubes Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from http://geneseeacademy.com/

Enterprise Data Warehouse Data Marts Sources Stage Data Lake STAGE Abstraction Layer Data Warehouse Raw BDV EDW Cubes Data Lake Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from http://geneseeacademy.com/

Ensembles Player Defines an associated set of data Holds the Core Business Concepts of Event, Person, Thing, Place, Concept Breaking a Unit of Work apart will cause associations between source system entities to be lost Game Season Leslie

Ensemble Modeling/Data Vault The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Extremely Agile (iterative and incremental) in nature Strong in pattern for automated build and works well with BIML Leslie

Data Vault Better real time load capabilities - Mostly inserts Pros Better real time load capabilities - Mostly inserts Incremental builds = Easy Provides Audit History and traceability The ability to respond to changes rapidly in your physical model Iterative development Keeping control of and reporting on data quality issues Leslie

Data Vault Cons It is suggested that the extra joins introduced with Data Vault modeling will impact query performance response: Depends on size, hardware, database and indexing strategy. Adhoc reporting is difficult response: Use views or other abstract layer concept Two data warehouses - twice the cost? response: By having well defined usage and purpose the longevity of the systems quickly outruns the cost of implementation, the BDV is NOT a full duplication of the RDV Leslie and Jeff

Tables and Rules

Terms you need to know and some rules SK (or PK or SQN i.e. CustomerSK) = Surrogate Key LDTS = Load Date Time Stamp LEDTS= Load End Date Time Stamp RS = Record Source Leslie

Data Vault Objects Hubs – Ensemble Identifiers Links - Relationships Satellites – Descriptive information Leslie

Ensembles and Relationships Link F(x) Records a history of the interaction Hub Sat F(x) Player Hub Sat F(x) Season Elements: Hub Link Satellite Hub Sat F(x) Game Image from LearnDataVault.com; Dan Linstedt

Hub A hub is based on an identifiable business element Player Sat Sat Hub Sat A hub is based on an identifiable business element An identifiable business element is an attribute that is used in the source systems to locate data, otherwise known as a ensemble identifier The ensemble identifier has a very low propensity to change, and usually is not editable on the source systems Hubs are loaded first – they are the matcher

Example Finding the Ensemble Identifier TEAM_ID TEAM_ABBREV TEAM_NAME TEAM_NICKNAME 323 Atl Atlanta Falcons 324 Buf Buffalo Bills 325 Hou Houston Texans 326 Chi Chicago Bears 327 Cin Cincinnati Bengals 329 Cle Cleveland Browns 331 Dal Dallas Cowboys 332 Den Denver Broncos 334 Det Detroit Lions 335 GB Green Bay Packers 336 Ten Tennessee Titans 338 Ind Indianapolis Colts 339 KC Kansas City Chiefs 341 Oak Oakland Raiders 343 StL St. Louis Rams 345 Mia Miami Dolphins 347 Min Minnesota Vikings 348 NE New England Patriots 350 NO New Orleans Saints 351 NYG New York Giants 352 NYJ Jets 354 Phi Philadelphia Eagles 355 Ari Arizona Cardinals 356 Pit Pittsburgh Steelers 357 SD San Diego Chargers 359 SF San Francisco 49ers 361 Sea Seattle Seahawks 362 TB Tampa Bay Buccaneers 363 Was Washington Redskins 364 Car Carolina Panthers 365 Jac Jacksonville Jaguars 366 Bal Baltimore Ravens Can be thought of as the business key

HUB Example TeamSK TeamNickName LDTS RS 1 Falcons 1/14/13 9:18 PM STATS Sports Database 2 Bills 3 Texans 4 Bears 5 Bengals 6 Browns 7 Cowboys 8 Broncos Ensemble Identifier, Business Key, Load Date, Record Source are mandatory All attributes in the business key are a UNIQUE Index NEVER directly join a HUB to another HUB table

Records a history of the interaction Link A Link is an association of two or more business keys It is based on an identifiable business element relationships It can contain Hub keys and other Link keys A Link’s business key is a composite unique index Link Records a history of the interaction

Link Example Sequence Number, Business Key, Load Date, Record Source are mandatory The relationship shouldn’t change over time. It is established as a fact that occurred at a specific point in time and will remain that way forever TEAM_GAME SEASON TEAM (Opponent) TEAM HUB LNK HUB HUB TeamGameSK GameDate SeasonSK TeamSK OpponentSK LDTS RS 1 9/27/2012 33 6 32 1/15/13 7:11 PM STATS Sports Database 2 3 9/30/2012 18 4 5 11 7 13 25 8

Satellite A Satellite is based on a non-identifying business elements Player Sat Sat Hub Sat A Satellite is based on a non-identifying business elements “Descriptive data” Satellite data changes, sometimes rapidly, sometimes slowly Satellites are separated by type of information and rate of change

SAT Example Satellite is dependent on the Hub or Link key as a parent TeamSK LDTS STATSTeamID TeamAbbrev TeamName LEDTS RS 1 1/14/13 9:24 PM 323 Atl Atlanta NULL STATS Sports Database 2 324 Buf Buffalo 3 325 Hou Houston 4 326 Chi Chicago 5 327 Cin Cincinnati 6 329 Cle Cleveland 7 331 Dal Dallas 8 332 Den Denver 4/1/18 12:01 AM Donkeys 4/1/18 3:15 PM Upset Fan Satellite is dependent on the Hub or Link key as a parent The Satellite is never dependent on more than one parent table The Satellite is not a parent table to any other table Sequence Number, Business Key, Load Date, Load End Date, Descriptive Data and Record Source are mandatory

Business Key This Photo by Unknown Author is licensed under CC BY-NC-ND

One source – One UK SELECT CustomerId ,CustomerName FROM Customers sp_help Customers CREATE TABLE [dbo].[HUBCustomer]( [CustomerSK] [smallint] IDENTITY(1,1) NOT NULL, [CustomerID] [int] NULL, [LDTS] [datetime] NULL, [RS] [varchar](150) NULL, CONSTRAINT [PK_HubCustomer] PRIMARY KEY CLUSTERED ( [CustomerSK] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], CONSTRAINT [UK_HubCustomer] UNIQUE NONCLUSTERED [CustomerID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

More than one source – Multiple UK All Columns CustomerID CompanyID CustomerType Concatenate columns Customerid|Companyid|CustomerType JSON

Hash Key Numerical representation of a column or set of columns that represent the uniqueness of a record

Keyed Instance

Reference Tables Referenced by SATS Not bound by FK May or may not have history i.e. translations, translate keys, provide descriptive information

Loading the DV

Master Package Overview

Json for RecordSource Easy to Parse with built in server functions Extends information about where, why or how of that record Use to report Data lineage Source Information Pretty much any tracking information you need! { "Source": { "DB": "Salesforce", "TBL": "Customer", "OriginalFieldName": "CUID" }, "ETL": { "JobName": "LoadSALES", "PartofStep": "LoadHubCustomer" }

Sat Temporal Tables Automation of creating the historical record Replaces development time for History (Type 2) table creation Reduces code for loading data Easier to read current records Maintains that data audit trail with the history table BEST PART = the RDBMS is doing this for us!!

How it Works https://docs.microsoft.com/en-us/sql/relational-databases/tables/media/temporalusagescenario1.png?view=sql-server-2017

Master Data

Enterprise Code Example Data Vault Reporting Tools Data Marts Master Data Management System Enterprise Code “Chargers” Source 3 Source 1 Source 2 Codes “LAC” Codes “L.A. Chargers” Codes “LA Chargers” Codes “San Diego Chargers”

Data Vault Linking Codes Enterprise Link Records Chargers <- LA Chargers Chargers <- L.A. Chargers Chargers <- LAC Chargers <- San Diego Chargers Same As Link Records LA Chargers <- L.A. Chargers LA Chargers <- LAC HubTeam Team SK LnkTeam_Enterprise  Team SK  Team SK ENT LnkTeam_SameAs  Team SK  Team SK SAS SatTeam_Source1 Team SK LDTS SatTeam_Source2 Team SK LDTS SatTeam_Source3 Team SK LDTS SatTeam_Enterprise team SK LDTS

Resources https://hanshultgren.wordpress.com/ http://geneseeacademy.com/ https://danlinstedt.com/solutions-2/data-vault-basics/ https://youtu.be/QbBmYMaQFec Lots on Youtube

THANK YOU, SPONSORS! Rockstar Sponsors!

THANK YOU, SPONSORS! Gold Sponsors After Party Sponsor Breakfast Sponsor

THANK YOU, SPONSORS! Silver Sponsors Bronze Sponsors

Hub Table Loading

Sat Table Loading

Link Table Loading