Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.

Similar presentations


Presentation on theme: "Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing."— Presentation transcript:

1 Designing a Data Warehousing System

2 Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema

3 Business Analysis Process Identifying Business Drivers and Objectives Gathering and Analyzing Information Establishing a Conceptual Data Model Identifying a Business Process Identifying Sources and Performing Transformations Establishing Duration

4  Data Warehousing System Data in OLAP Environment Data Marts Data Warehouse Data from Operational Systems Data from Operational Systems Purchasing Production Accounting Enterprise Data OLTP Sales

5 Operational: OLTP Analytical: Data Warehouse Comparing Database Modeling Environments Defines Entities That Are Fully Normalized Follows Third Normal Form or Greater Produces a Complex Database Design Stores Data at the Lowest Level of Transactional Detail Increases the Number of Joined Tables in Queries Is Typically Static Defines Entities That Are Denormalized Produces a Simple Database Design That Is More Easily Understood by Users Stores Data Transactional level Summarized level Decreases the Number of Joined Tables in Queries Is Dynamic

6  Modeling a Data Warehouse Data Warehouse Modeling Components Using a Star Schema Components of a Star Schema Using a Snowflake Schema Choosing a Schema

7 Data Warehouse Modeling Components Geographic Product Time Units $ $ Dimension Tables GeographicGeographic ProductProduct TimeTime Fact Table Measures FactsFacts DimensionDimension

8 Using a Star Schema Fact Table Dimension Table Time_DimTime_Dim TimeKey TheDate. TheDate. Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Employee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. Product_DimProduct_Dim ProductKey ProductID. ProductID. Customer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. Shipper_DimShipper_Dim ShipperKey ShipperID. ShipperID.

9 Components of a Star SchemaEmployee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. EmployeeKeyTime_DimTime_Dim TimeKey TheDate. TheDate. TimeKeyProduct_DimProduct_Dim ProductKey ProductID. ProductID. ProductKeyCustomer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. CustomerKeyShipper_DimShipper_Dim ShipperKey ShipperID. ShipperID. ShipperKey Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. TimeKey CustomerKey ShipperKey ProductKey EmployeeKey Multipart Key MeasuresMeasures Dimensional Keys

10 Using a Snowflake Schema Secondary Dimension Tables Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Product_Brand_IdProduct_Brand_Id Product Brand Product Category ID Product_Category_IdProduct_Category_Id Product Category Product Category ID Product_DimProduct_Dim ProductKey Product Name Product Size Product Brand ID Primary Dimension Table

11 StarStarSnowflakeSnowflake Model Understandability Easier More Difficult Number of Tables Less More Query Complexity Simpler More Complex Query Performance Quicker Slower Choosing a Schema

12 Choosing the Grain Determining Data Requirements Choosing the Lowest Level of Detail Requires disk space Involves more process time Provides detailed data analysis capability Conforming Measures to the Stated Grain Design Considerations

13  Establishing Dimensions Defining Dimension Characteristics Identifying Dimension Hierarchies Defining Conventional Dimensions Sharing Dimensions Among Other Data Marts Defining Other Types of Dimensions

14 Defining Dimension Characteristics Applying Characteristics to Dimension Tables Define a primary key Include highly correlated and descriptive character columns Designing for Usability and Extensibility Minimize or avoid using codes or abbreviations Create columns that are useful for levels of aggregation Avoid missing or null values Minimize the number of rows that change over time

15 Identifying Dimension Hierarchies Consolidated Hierarchy Store Location ContinentContinentCountryCountryRegionRegionCityCityStoreStore Separate Hierarchy Store Location Continent ContinentContinent Country CountryCountry Region RegionRegion City CityCity Store StoreStore 01

16 Defining Conventional Dimensions Time Dimension Break time down into individual attributes Represent time as work days, weekends, holidays, seasons, or fiscal periods Is limited to the grain of the fact table Geographic Dimension Product Dimension Customer Dimension

17 Sharing Dimensions Among Other Data Marts One instance exist and is shared among data marts TimeTime Multiple instances exist in individual data marts Sales Production Purchasing

18 Defining Other Types of Dimensions Defining Degenerate Dimensions Useful for consolidated reporting at the business event level Defining Junk Dimensions Useful for capturing important information without increasing the size of the fact table

19  Establishing a Fact Table Defining the Fact Table Defining Precalculations Minimizing Fact Table Size Balancing Size and Performance

20 Defining the Fact Table Applying the Grain Ensuring Consistency Between Measures Using Additive and Numeric Values Summarizing Data

21 Defining Precalculations Single Row Precalculations Fact Table Values are derived from measures within that row Time Keys 7 7 13 25 7 7 Product Keys 2 2 30 8 8 5 5 PricePrice 20.00 ~ ~ ~ ~ 10.00 DiscountDiscount.10 ~ ~ ~ ~ RebateRebate 5.00 ~ ~ ~ ~ Extended Price 13.00 ~ ~ ~ ~ 4.00... Multiple Row Precalculations Fact Table Time Keys 7 7 10 ~ ~... Product Keys 2 2 2 2 ~ ~... Year-to-date sales 20,000.00 25,000.00 ~ ~... Values are derived from multiple rows ((Price - (Price X Discount)) - Rebate) = Extended Price SUM(Extended Price) = Year-to-date Sales

22 Minimizing Fact Table Size Reducing the Number of Columns Data is redundant Data is not required for analysis Reducing the Size of Each Column Use surrogate keys Ensure that character and binary data is variable length

23 Balancing Size and Performance Storing Large Fact Tables Designing Star Schema Fact tables—long and narrow Dimension tables—short and wide Including Precalculated Data Improves query performance but increases the size of a fact table Moving Fact Table Columns to Another Table Reduces fact table size but may affect query performance

24 Lab A: Designing a Star Schema

25  Implementing a Star Schema Estimating Size of the Data Warehouse Creating a Database Creating Tables Creating Constraints Creating Indexes

26 Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 DescriptionDescription Number of rows in fact table Estimated row size of fact table Estimated data warehouse size Calculation Method 10,000 x 4 x 365 x 5 (7 IDs x 4 bytes) + (5 measures x 4 bytes) 48 bytes x 73,000,000 rows ValueValue 73,000,000 ~ 48 bytes ~3.5 GB Estimating Size of the Data Warehouse Size of Fact Table Grain Bytes per Row

27 Creating a Database Using CREATE DATABASE Options SIZE MAXSIZE FILEGROWTH Setting Database Options Trunc. log on chkpt. SELECT INTO/Bulkcopy

28 Creating Tables Creating a Table Specifying NULL or NOT NULL Generating Column Values

29 Creating Constraints Using PRIMARY KEY Constraints Does not allow duplicate values Allows index to be created Does not allow null values Using FOREIGN KEY Constraints Is the multipart key stored in the fact table Defines a reference to a column with a PRIMARY KEY or UNIQUE constraint Specifies the data values that are acceptable to update

30 Creating Indexes Steps for Creating Data Warehouse Indexes Define primary key in dimension tables Declare foreign key relationships Define primary key in fact table Define indexes on each foreign key in fact table Using Surrogate Keys Using Clustered Indexes Using Nonclustered Indexes Creating Composite Indexes1122 33 44

31 Recommended Practices Use Star Schema to Model Data Mart or Data Warehouse Database Do Not Mix Grain in Individual Fact Table Attributes Use Single Element Surrogate Keys When Defining Dimensions Define Shared Dimensions Use Facts That Are Both Numeric and Additive Choose Grain

32 Lab B: Implementing a Star Schema

33 Review Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema


Download ppt "Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing."

Similar presentations


Ads by Google