CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

Cognos 8 Training Session
Chapter 10: Designing Databases
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
Dimensional Modeling – Part 2
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Organizing Data & Information
1 Minggu 2, Pertemuan 3 The Relational Model Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Concepts and Terminology Introduction to Database.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 3 The Relational Model. 2 Chapter 3 - Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
Normalization of Data  Relatively easy examples from –Discussion –1 st Normal Form –2 nd Normal Form –3 rd Normal Form.
Normalization Transparencies
CORE 2: Information systems and Databases NORMALISING DATABASES.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
BI Terminologies.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 10 Normalization Pearson Education © 2009.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Normalization of Data Relatively easy example (
Database Management Systems (DBMS)
Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)
Chapter 4 Logical & Physical Database Design
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
Chapter 4 The Relational Model Pearson Education © 2009.
Data warehouse and OLAP
Database Normalization
Chapter 9 Designing Databases
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Relational Database Model
The Relational Model Transparencies
Chapter 4 The Relational Model Pearson Education © 2009.
The Road to Denormalization
Chapter 4 The Relational Model Pearson Education © 2009.
Dimensional Model January 16, 2003
Chapter 4 The Relational Model Pearson Education © 2009.
Presentation transcript:

DIMENSIONAL MODELING: CHAPTER 11: DIMENSIONAL MODELING: ADVANCED TOPICS

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA

Normalization The process of making your data and tables match these standards is called normalizing data or data normalization. Normalization is the process of efficiently organizing data in a database. T here are two goals of the normalization process: 1- eliminating redundant data 2- ensuring data dependencies make sense. Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. In creating a database, normalization is the process of organizing it into tables in such a way that the results of using the database are always unambiguous and as intended (usually divide large tables into smaller for easier to maintain it).

A simple example of normalizing data might consist of a table showing: Customer Item purchased Purchase price Thomas Shirt $40 Maria Tennis shoes $35 Evelyn Pajaro Trousers $25 If this table is used for the purpose of keeping track of the price of items and you want to delete one of the customers, you will also delete a price. Normalizing the data would mean understanding this and solving the problem by dividing this table into two tables, one with information about each customer and a product they bought and the second about each product and its price.

Normalization degrees: First normal form (1NF). This is the "basic" level of normalization and generally corresponds to the definition of any database: It contains two-dimensional tables with rows and columns. Each column corresponds to a sub-object or an attribute of the object represented by the entire table. Each row represents a unique instance of that sub-object or attribute and must be different in some way from any other row (that is, no duplicate rows are possible). All entries in any column must be of the same kind. For example, in the column labeled "Customer," only customer names or numbers are permitted.

Second normal form (2NF) Second normal form (2NF). At this level of normalization, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table. For example, in a table with three columns containing customer ID, product sold, and price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product.

Third normal form (3NF). At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table. For example, using the customer table just cited, removing a row describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately.

Snowflake Schema The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. Snowflake schema consists of a fact table surrounded by multiple dimension tables which can be connected to other dimension tables via many-to-one relationship. The normalization of dimension tables tends to increase number of dimension tables or sub-dimension table that require more foreign key joins when querying the data therefore reduce the query performance. The query of snowflake schema is more complex than query of star schema due to multiple joins from dimension table to sub-dimension tables. Therefore in snowflake schema, instead of having big dimension tables connected to a fact table, we have a group of multiple dimension tables. The snowflake schema helps save storage however it increases the number of dimension tables.

Star schema

Snowflake schema

Snowflake schema advantages: Snowflake schema help to save space by normalizing dimension tables. It is more difficult for business users who use data warehouse system using snowflake schema because they have to work with more tables than star schema. Snowflake schema is designed from star schema by further normalizing dimension tables to eliminate data redundancy. Small savings in storage space. Normalized structures are easier to update and maintain.

Snowflake schema disadvantages: The normalization of dimension tables tends to increase number of dimension tables or sub-dimension table that require more foreign key joins when querying the data therefore reduce the query performance. The query of snowflake schema is more complex than query of star schema due to multiple joins from dimension table to sub-dimension tables.

Snowflake schema example

Let’s examine the snowflake schema above in a greater detail: DIM_STORE dimension table is normalized to add one more dimension table called DIM_GEOGRAPHY DIM_PRODUCT dimension table is normalized to add 2 more dimension tables called DIM_BRAND and DIM_PRODUCT_CATEGORY DIM_DATE dimension table is now connecting with three other dimension tables: DIM_DAY_OF_WEEK, DIM_MONTH and DIM_QUARTER. Fact table remains the same as star schema.

Star schema vs. Snowflake schema Understandability Easier for business users and analysts to query data. May be more difficult for business users and analysts due to number of tables they have to deal with. Dimension table Only have one dimension table for each dimension that groups related attributes. Dimension tables are not in the third normal form. May have more than 1 dimension table for each dimension due to the further normalization of each dimension table. Query complexity The query is very simple and easy to understand More complex query due to multiple foreign key joins between dimension tables

Star schema vs. Snowflake schema Query performance High performance. Database engine can optimize and boost the query performance based on predictable framework. More foreign key joins therefore longer execution time of query in compare with star schema When to use When dimension tables store relative small number of rows, space is not a big issue we can use star schema. When dimension tables store large number of rows with redundancy data and space is such an issue, we can choose snowflake schema to save space. Foreign Key Joins Fewer Joins Higher number of joins Data warehouse system Work best in any data warehouse / data mart Better for small data warehouse/ data mart

1. Data optimization: Snowflake model uses normalized data, i.e. the data is organized inside the database in order to eliminate redundancy and thus helps to reduce the amount of data. The hierarchy of the business and its dimensions are preserved in the data model through referential integrity. Figure 1 – Snow flake model

Star model on the other hand uses de-normalized data Star model on the other hand uses de-normalized data. In the star model, dimensions directly refer to fact table and business hierarchy is not implemented via referential integrity between dimensions. Figure 2 – Star model

2. Business model: Primary key is a single unique key (data attribute) that is selected for a particular data. In the previous ‘advertiser’ example, the Advertiser_ID will be the primary key (business key) of a dimension table. The foreign key (referential attribute) is just a field in one table that matches a primary key of another dimension table. In our example, the Advertiser_ID could be a foreign key in Account_dimension. In the snowflake model, the business hierarchy of data model is represented in a primary key –Foreign key relationship between the various dimension tables. In the star model all required dimension-tables have only foreign keys in the fact tables.

3. Performance: The third differentiator in this Star schema vs Snowflake schema face off is the performance of these models. The Snowflake model has higher number of joins between dimension table and then again the fact table and hence the performance is slower. For instance, if you want to know the Advertiser details, this model will ask for a lot of information such as the Advertiser Name, ID and address for which advertiser and account table needs to be joined with each other and then joined with fact table. The Star model on the other hand has lesser joins between dimension tables and the facts table. In this model if you need information on the advertiser you will just have to join Advertiser dimension table with fact table.

4. ETL Snowflake model loads the data marts and hence the ELT job is more complex in design and cannot be parallelized as dependency model restricts it. The Star model loads dimension table without dependency between dimensions and hence the ETL job is simpler and can achieve higher parallelism. Extract, Transform, Load (ETL) In managing databases, extract, transform, load (ETL) refers to three separate functions combined into a single programming tool. The extract function reads data from a specified source database and extracts a desired subset of data. The transform function works with the acquired data - using rules or lookup tables, or creating combinations with other data - to convert it to the desired state. The load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist.