Just Enough Database Theory for Power Pivot / Power BI

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Technical BI Project Lifecycle
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Dimensional Modeling Business Intelligence Solutions.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
Data Warehousing.
BI Terminologies.
MIS2502: Data Analytics The Information Architecture of an Organization.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Indexes and Views Unit 7.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Chapter 4 Logical & Physical Database Design
BI Practice March-2006 COGNOS 8BI TOOLS COGNOS 8 Framework Manager TATA CONSULTANCY SERVICES SEEPZ, Mumbai.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
INFS 6220 Systems Analysis & Design Transactional DBs vs. Data Warehouses.
Database Planning Database Design Normalization.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
SQL Basics Review Reviewing what we’ve learned so far…….
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Understanding Data Storage
Chapter 4: Logical Database Design and the Relational Model
Power BI Performance Tips & Tricks
Applied CyberInfrastructure Concepts Fall 2017
Chapter 13 The Data Warehouse
ITD1312 Database Principles Chapter 5: Physical Database Design
Your friends / colleagues can watch this session Live on !
Summarized from various resources Modern Database Management
Star Schema.
Applying Data Warehouse Techniques
Competing on Analytics II
Translation of ER-diagram into Relational Schema
CMPE 226 Database Systems April 11 Class Meeting
The Key to the Database Engine
Should This Be Normalized?
Database Vs. Data Warehouse
Enhance BI Applications and Simplify Development
University of Houston-Clear Lake Kaiser Permanente San Jose
Applying Data Warehouse Techniques
Should This Be Normalized?
INFS 3220 Systems Analysis & Design
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MIS2502: Data Analytics Dimensional Data Modeling
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Applying Data Warehouse Techniques
MIS2502: Data Analytics Dimensional Data Modeling
The Road to Denormalization
Adding Multiple Logical Table Sources
DWH – Dimesional Modeling
Execution plans Eugene
Chapter 17 Designing Databases
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Should This Be Normalized?
Presentation transcript:

Just Enough Database Theory for Power Pivot / Power BI @sqlgene http://sqlgene.com

About me BI Consultant for All-lines-technology Pluralsight Author PASS Summit speaker (SOON!) Co-leader Pittsburgh Power BI user group Former Newbie

Why This Talk? Self-service BI encourages ad-hoc learning Power Pivot and Power BI reach a broader audience Not everyone is formally trained Accordingly, a quick refresher on database theory is due

Problems Caused by Lack of Theory Duplicate rows Incorrect joins Muddling through your modeling Aggregates not adding up properly Repeated values being counted twice

What is this talk about? Keys Normal form Granularity Cardinality OLTP versus OLAP

Keys A key is a way of identifying a row, ideally uniquely.

Types of keys Primary key – A unique identifier for a row Composite key – A key made of multiple columns Natural key – A key based on the natural data Surrogate key – A key arbitrarily assigned, usually an integer Foreign key – A reference to the primary key of another table

Examples Type of Key Example Primary Username Composite City & state; make & model; Natural key Employee name; customer name; state name Surrogate key Employee number; social security number Foreign key N/A

What Does This Have to do With Power BI? If you don’t understand keys, it’s hard to understand the rest of this talk Power BI doesn’t require primary keys on tables It does enforce uniqueness constraints on joins (see cardinality section) Some joins won’t work if your keys aren’t unique Power BI does not support composite keys You have to fake them Power BI can take advantage of foreign key relationships There is no clustered index, the engine sort the data

Joins Joins are when we associate two tables via a primary key and the related foreign key If you’ve done a VLOOKUP(), you’ve done a join If you’ve written SQL before, you’ve done a join

Normal Form Normal form is a measure of how tightly the data relates to the key

Normal Form: An Analogy What if paper was expensive? What if ink was expensive? City State State bird City Population Pittsburgh PA Ruffed Grouse 303,625 Philadelphia 1,568,000

Levels of Normal Form First normal form Second normal form Fits in a table No repeated groups No comma separated values Second normal form Everything that relates to only part of the key goes in it’s own table Third normal form Everything that can go into it’s own table does

What Level of Normal Form is This? City State State bird City Population Pittsburgh PA Ruffed Grouse 303,625 Philadelphia 1,568,000

Denormalization If normal form is optimized for space and writes, Denormalization optimizes for reads If you’ve ever dumped a multi-table query to a csv file, you’ve denormalized data Denormalized data is often preferred for analytical reporting

What Does This Have to do With Power BI? You don’t need to memorize the levels of normal form Third normal form is BAD for Power BI Joins are expensive Wide Tables are BAD for Power BI More columns = worse compression Power BI is optimized for deep, skinny tables, with few joins Power BI is optimized for Star Schema (We’ll get to that at the end)

Granularity Granularity, or grain, is what level of detail your key refers to

The Key Question Can you answer the following question: City State What does this row “represent” ? City State State bird City Population Pittsburgh PA Ruffed Grouse 303,625 Philadelphia 1,568,000

What Does This Have to do With Power BI? Granularity helps you under understand cardinality and relationships Aggregate on the finer granularity (e.g. Sales table) Filter on the coarser granularity (e.g. Customer table, Location table) Important for understanding facts vs. dimensions and star schema

Cardinality How many rows from one table correspond directly to the rows in another table?

What Does This Have to do With Power BI? Filters generally flow from 1 side to many side Power BI does not support implicitly support many-to-many relationships Used to have to manually filter data Now easier with bi-directional filtering https://www.sqlbi.com/articles/many-to-many-relationships-in-power-bi-and- excel-2016/ by Marco Russo Bad data can prevent joins Keys need to be unique

OLTP versus OLAP OLTP (Online Transaction Processing) Traditional relational design Highly normalized Optimized for modifying individual transactions OLAP (Online Analytical Processing) Denormalized data with a fact table in the middle surrounded by dimensions Known as star schema Optimized for reading and analyzing lots of data

Fact vs. Dimensions A fact table is your central transaction table Usually has a surrogate key Usually has many foreign keys Very fine granularity Continually growing A dimension table is your reference tables Usually has a natural key Coarse granularity Slowly changing, slowly growing

What Does This Have to do With Power BI? Use dimensions for filters Use facts for aggregates Try to aim for a star schema design A fact table joined to multiple dimension tables Leads to better performance

Summary Keys identify rows Tables are joined on primary keys and foreign keys Granularity helps identify relationships Goes from course granularity to fine granularity (1 to many cardinality) Cardinality helps organize those relationships Aim for star schema design Central transactional/fact tables Surrounded by dimension tables