Download presentation
Presentation is loading. Please wait.
Published byJuliana Francisca Visser Modified over 5 years ago
1
INNOVATIVE DATA MODELING Make Data Warehousing Cool Again
Leslie Weed, Architect, RevGen Partners INNOVATIVE DATA MODELING Make Data Warehousing Cool Again
2
Leslie Weed Architect, RevGen Partners All those Data things Colorado
While starting as an app developer 20 years ago I quickly navigated to the data space and have enjoyed every minute of it. Colorado Leslie Weed Love living right next to some of the best parts of the Rocky Mountains. Enjoying both sun and snow. Architect, RevGen Partners Data Modeling is Fun /leslieweedsql The best part of the job – organizing data and helping others organize their data for great performance and usage. @weederbug
3
Keyed Instance and Reference Tables
Hello Data Vault Tables and Rules Business Keys Raw and Business Layer Keyed Instance and Reference Tables Leslie
4
Problems in Data Warehouses
Takes too long to build Once it is up then it is hard to add or modify It simply hasn’t been maintained and is outdated No History/Archive/Storage Plan No well defined usage (datamart vs views vs tabular vs reporting) There has got to be a better way Leslie
5
Hello Data Vault
6
Ensemble Patterns Ensemble Focal Point Anchor Data Vault Your Style
DV 2.0 Hyper Agility Temporal Leslie Image concept from
7
Enterprise Data Warehouse
Sources Stage Data Marts STAGE Data Warehouse Raw BDV EDW Cubes Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from
8
Enterprise Data Warehouse
Data Marts Sources Stage Data Lake STAGE Abstraction Layer Data Warehouse Raw BDV EDW Cubes Data Lake Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from
9
Ensembles Player Defines an associated set of data
Holds the Core Business Concepts of Event, Person, Thing, Place, Concept Breaking a Unit of Work apart will cause associations between source system entities to be lost Game Season Leslie
10
Ensemble Modeling/Data Vault
The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Extremely Agile (iterative and incremental) in nature Strong in pattern for automated build and works well with BIML Leslie
11
Data Vault Better real time load capabilities - Mostly inserts
Pros Better real time load capabilities - Mostly inserts Incremental builds = Easy Provides Audit History and traceability The ability to respond to changes rapidly in your physical model Iterative development Keeping control of and reporting on data quality issues Leslie
12
Data Vault Cons It is suggested that the extra joins introduced with Data Vault modeling will impact query performance response: Depends on size, hardware, database and indexing strategy. Adhoc reporting is difficult response: Use views or other abstract layer concept Two data warehouses - twice the cost? response: By having well defined usage and purpose the longevity of the systems quickly outruns the cost of implementation, the BDV is NOT a full duplication of the RDV Leslie and Jeff
13
Tables and Rules
14
Terms you need to know and some rules
SK (or PK or SQN i.e. CustomerSK) = Surrogate Key LDTS = Load Date Time Stamp LEDTS= Load End Date Time Stamp RS = Record Source Leslie
15
Data Vault Objects Hubs – Ensemble Identifiers Links - Relationships Satellites – Descriptive information Leslie
16
Ensembles and Relationships
Link F(x) Records a history of the interaction Hub Sat F(x) Player Hub Sat F(x) Season Elements: Hub Link Satellite Hub Sat F(x) Game Image from LearnDataVault.com; Dan Linstedt
17
Hub A hub is based on an identifiable business element
Player Sat Sat Hub Sat A hub is based on an identifiable business element An identifiable business element is an attribute that is used in the source systems to locate data, otherwise known as a ensemble identifier The ensemble identifier has a very low propensity to change, and usually is not editable on the source systems Hubs are loaded first – they are the matcher
18
Example Finding the Ensemble Identifier
TEAM_ID TEAM_ABBREV TEAM_NAME TEAM_NICKNAME 323 Atl Atlanta Falcons 324 Buf Buffalo Bills 325 Hou Houston Texans 326 Chi Chicago Bears 327 Cin Cincinnati Bengals 329 Cle Cleveland Browns 331 Dal Dallas Cowboys 332 Den Denver Broncos 334 Det Detroit Lions 335 GB Green Bay Packers 336 Ten Tennessee Titans 338 Ind Indianapolis Colts 339 KC Kansas City Chiefs 341 Oak Oakland Raiders 343 StL St. Louis Rams 345 Mia Miami Dolphins 347 Min Minnesota Vikings 348 NE New England Patriots 350 NO New Orleans Saints 351 NYG New York Giants 352 NYJ Jets 354 Phi Philadelphia Eagles 355 Ari Arizona Cardinals 356 Pit Pittsburgh Steelers 357 SD San Diego Chargers 359 SF San Francisco 49ers 361 Sea Seattle Seahawks 362 TB Tampa Bay Buccaneers 363 Was Washington Redskins 364 Car Carolina Panthers 365 Jac Jacksonville Jaguars 366 Bal Baltimore Ravens Can be thought of as the business key
19
HUB Example TeamSK TeamNickName LDTS RS 1 Falcons 1/14/13 9:18 PM STATS Sports Database 2 Bills 3 Texans 4 Bears 5 Bengals 6 Browns 7 Cowboys 8 Broncos Ensemble Identifier, Business Key, Load Date, Record Source are mandatory All attributes in the business key are a UNIQUE Index NEVER directly join a HUB to another HUB table
20
Records a history of the interaction
Link A Link is an association of two or more business keys It is based on an identifiable business element relationships It can contain Hub keys and other Link keys A Link’s business key is a composite unique index Link Records a history of the interaction
21
Link Example Sequence Number, Business Key, Load Date, Record Source are mandatory The relationship shouldn’t change over time. It is established as a fact that occurred at a specific point in time and will remain that way forever TEAM_GAME SEASON TEAM (Opponent) TEAM HUB LNK HUB HUB TeamGameSK GameDate SeasonSK TeamSK OpponentSK LDTS RS 1 9/27/2012 33 6 32 1/15/13 7:11 PM STATS Sports Database 2 3 9/30/2012 18 4 5 11 7 13 25 8
22
Satellite A Satellite is based on a non-identifying business elements
Player Sat Sat Hub Sat A Satellite is based on a non-identifying business elements “Descriptive data” Satellite data changes, sometimes rapidly, sometimes slowly Satellites are separated by type of information and rate of change
23
SAT Example Satellite is dependent on the Hub or Link key as a parent
TeamSK LDTS STATSTeamID TeamAbbrev TeamName LEDTS RS 1 1/14/13 9:24 PM 323 Atl Atlanta NULL STATS Sports Database 2 324 Buf Buffalo 3 325 Hou Houston 4 326 Chi Chicago 5 327 Cin Cincinnati 6 329 Cle Cleveland 7 331 Dal Dallas 8 332 Den Denver 4/1/18 12:01 AM Donkeys 4/1/18 3:15 PM Upset Fan Satellite is dependent on the Hub or Link key as a parent The Satellite is never dependent on more than one parent table The Satellite is not a parent table to any other table Sequence Number, Business Key, Load Date, Load End Date, Descriptive Data and Record Source are mandatory
24
Business Key This Photo by Unknown Author is licensed under CC BY-NC-ND
25
One source – One UK SELECT CustomerId ,CustomerName FROM Customers
sp_help Customers CREATE TABLE [dbo].[HUBCustomer]( [CustomerSK] [smallint] IDENTITY(1,1) NOT NULL, [CustomerID] [int] NULL, [LDTS] [datetime] NULL, [RS] [varchar](150) NULL, CONSTRAINT [PK_HubCustomer] PRIMARY KEY CLUSTERED ( [CustomerSK] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], CONSTRAINT [UK_HubCustomer] UNIQUE NONCLUSTERED [CustomerID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
26
More than one source – Multiple UK
All Columns CustomerID CompanyID CustomerType Concatenate columns Customerid|Companyid|CustomerType JSON
27
Hash Key Numerical representation of a column or set of columns that represent the uniqueness of a record
28
Keyed Instance
29
Reference Tables Referenced by SATS Not bound by FK
May or may not have history i.e. translations, translate keys, provide descriptive information
30
Loading the DV
31
Master Package Overview
32
Json for RecordSource Easy to Parse with built in server functions
Extends information about where, why or how of that record Use to report Data lineage Source Information Pretty much any tracking information you need! { "Source": { "DB": "Salesforce", "TBL": "Customer", "OriginalFieldName": "CUID" }, "ETL": { "JobName": "LoadSALES", "PartofStep": "LoadHubCustomer" }
33
Sat Temporal Tables Automation of creating the historical record
Replaces development time for History (Type 2) table creation Reduces code for loading data Easier to read current records Maintains that data audit trail with the history table BEST PART = the RDBMS is doing this for us!!
34
How it Works
35
Master Data
36
Enterprise Code Example
Data Vault Reporting Tools Data Marts Master Data Management System Enterprise Code “Chargers” Source 3 Source 1 Source 2 Codes “LAC” Codes “L.A. Chargers” Codes “LA Chargers” Codes “San Diego Chargers”
37
Data Vault Linking Codes
Enterprise Link Records Chargers <- LA Chargers Chargers <- L.A. Chargers Chargers <- LAC Chargers <- San Diego Chargers Same As Link Records LA Chargers <- L.A. Chargers LA Chargers <- LAC HubTeam Team SK LnkTeam_Enterprise Team SK Team SK ENT LnkTeam_SameAs Team SK Team SK SAS SatTeam_Source1 Team SK LDTS SatTeam_Source2 Team SK LDTS SatTeam_Source3 Team SK LDTS SatTeam_Enterprise team SK LDTS
38
Resources https://hanshultgren.wordpress.com/
Lots on Youtube
39
THANK YOU, SPONSORS! Rockstar Sponsors!
40
THANK YOU, SPONSORS! Gold Sponsors After Party Sponsor
Breakfast Sponsor
41
THANK YOU, SPONSORS! Silver Sponsors Bronze Sponsors
42
Hub Table Loading
43
Sat Table Loading
44
Link Table Loading
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.