Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIT-II Principles of dimensional modeling

Similar presentations


Presentation on theme: "UNIT-II Principles of dimensional modeling"— Presentation transcript:

1 UNIT-II Principles of dimensional modeling
Dimensional modeling: advanced topics ETL OLAP

2 Principles of dimensional modeling
From requirements to data design STAR schema STAR schema keys Advantages of the STAR schema

3 From requirements to data design
Requirements gathering Requirements definition document (with information packages) Data design Dimensional model (figure 10-1)

4 Figure 10-1

5 Design decisions Choosing the process (Subjects)
Choosing the grain(Level of Details) Identifying and conforming the dimensions Choosing the facts Choosing the duration of the database (Duration of historical data)

6 Dimensional modeling basics
From the information package diagram: The metrics or facts  fact table (figure 10-2) Dimensions  dimension tables with attributes (figure 10-3)

7 Figure 10-2

8 Figure 10-3

9 Dimensional modeling Dimensional model with fact table in the middle and the dimension tables around Called a STAR schema (figure 10-4)

10 Figure 10-4

11 Dimensional Data Modeling (DDM)
DDM comprises of one or more dimension tables and fact tables. Dimension tables store records related to that particular dimension. E.g. location, Product, Time. A fact (measure) table contains measures (sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables.

12 Example of Dimensional Data Model:

13 In the figure, sales fact table is connected to dimensions (location, product, time and organization). It shows that data can be sliced across all dimensions and It is also possible for the data to be aggregated across multiple dimensions.

14 ‘Sales Dollar’ in sales fact table can be calculated across all dimensions independently or in a combined manner that is explained below. Sales Dollar value for a particular product Sales Dollar value for a product in a location Sales Dollar value for a product in a year within a location Sales Dollar value for a product in a year within a location sold or serviced by an employee

15 Uses of DDM DDM is used for calculating summarized data.
For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on. The data can then be referred to as aggregate / summarized data. The performance of DDM can be significantly increased when materialized views are used.

16 Materialized view is a pre-computed table comprising aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate table.

17 Dimension Table Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.

18 Relational vs Dimensional
Relational Data Model (RDM) is used in OLTP systems, which are transaction oriented, and DDM is used in OLAP systems, which are analytical based.  In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in a DW.

19 RDM DDM Data is stored in RDBMS Tables are units of storage
Data is normalized and used for OLTP. Optimized for OLTP processing Several tables and chains of relationships among them Volatile (several updates) Detailed level of transactional data Normal Reports Data is stored in RDBMS or Multidimensional databases Cubes are units of storage Data is denormalized and used in DW and data mart. Optimized for OLAP Few tables and fact tables are connected to dimensional tables Non volatile and time variant Summary of bulky transactional data (Aggregates and Measures) used in business decisions User friendly, interactive, drag and drop multidimensional OLAP Reports

20 DM Versus E-R modeling (figure 10-5, 10-6)

21 The STAR Schema Star Schema is a database schema for representing multi-dimensional data.  It is the simplest form of DW schema that contains one or more dimensions and fact tables.

22 The STAR Schema It is called a star schema because the relationship between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions.  The center of the star schema consists of a large fact table and it points towards the dimension tables. Simple STAR schema (figure 10-7)

23 Figure 10-7

24 Steps in designing Star Schema
Identify a business process for analysis (like sales). Identify measures or facts. Identify dimensions for facts. List the columns that describe each dimension. Determine the lowest level of summary in a fact table.

25 Characteristics of Dimension Table
Dimension Table Key (PK) Table is Wide Textual Attributes Attributes not directly related Not Normalized Drilling-down, rolling-up Multiple Hierarchies Fewer no of records

26 Inside a dimension table (figure 10-10)

27 Characteristics of Fact Table
Concatenated key Data granularity Measure Types Full Additive - Measures that can be added across all dimensions. Non-Additive - Measures that cannot be added across all dimensions. Semi Additive - Measures that can be added across few dimensions and not with others. Table deep, not wide Sparse data

28 Inside the fact table (figure 10-11)

29 Factless fact table (figure 10-12)

30 Data granularity fact table at lowest grain

31 Star schema keys Primary key (dimension table)
Surrogate keys (system-generated sequence keys) Avoid built-in meanings in keys Do not use production system keys Foreign key in fact table Concatenated primary key in fact table

32 Advantages of STAR schema
STAR schema is a relational model, it is not a normalized model: Easy for user to understand Optimizes navigation Most suitable for query processing


Download ppt "UNIT-II Principles of dimensional modeling"

Similar presentations


Ads by Google