Download presentation
Presentation is loading. Please wait.
Published byConrad Haynes Modified over 9 years ago
1
Designing a Data Warehousing System
2
Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema
3
Business Analysis Process Identifying Business Drivers and Objectives Gathering and Analyzing Information Establishing a Conceptual Data Model Identifying a Business Process Identifying Sources and Performing Transformations Establishing Duration
4
Data Warehousing System Data in OLAP Environment Data Marts Data Warehouse Data from Operational Systems Data from Operational Systems Purchasing Production Accounting Enterprise Data OLTP Sales
5
Operational: OLTP Analytical: Data Warehouse Comparing Database Modeling Environments Defines Entities That Are Fully Normalized Follows Third Normal Form or Greater Produces a Complex Database Design Stores Data at the Lowest Level of Transactional Detail Increases the Number of Joined Tables in Queries Is Typically Static Defines Entities That Are Denormalized Produces a Simple Database Design That Is More Easily Understood by Users Stores Data Transactional level Summarized level Decreases the Number of Joined Tables in Queries Is Dynamic
6
Modeling a Data Warehouse Data Warehouse Modeling Components Using a Star Schema Components of a Star Schema Using a Snowflake Schema Choosing a Schema
7
Data Warehouse Modeling Components Geographic Product Time Units $ $ Dimension Tables GeographicGeographic ProductProduct TimeTime Fact Table Measures FactsFacts DimensionDimension
8
Using a Star Schema Fact Table Dimension Table Time_DimTime_Dim TimeKey TheDate. TheDate. Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Employee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. Product_DimProduct_Dim ProductKey ProductID. ProductID. Customer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. Shipper_DimShipper_Dim ShipperKey ShipperID. ShipperID.
9
Components of a Star SchemaEmployee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. EmployeeKeyTime_DimTime_Dim TimeKey TheDate. TheDate. TimeKeyProduct_DimProduct_Dim ProductKey ProductID. ProductID. ProductKeyCustomer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. CustomerKeyShipper_DimShipper_Dim ShipperKey ShipperID. ShipperID. ShipperKey Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. TimeKey CustomerKey ShipperKey ProductKey EmployeeKey Multipart Key MeasuresMeasures Dimensional Keys
10
Using a Snowflake Schema Secondary Dimension Tables Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Product_Brand_IdProduct_Brand_Id Product Brand Product Category ID Product_Category_IdProduct_Category_Id Product Category Product Category ID Product_DimProduct_Dim ProductKey Product Name Product Size Product Brand ID Primary Dimension Table
11
StarStarSnowflakeSnowflake Model Understandability Easier More Difficult Number of Tables Less More Query Complexity Simpler More Complex Query Performance Quicker Slower Choosing a Schema
12
Choosing the Grain Determining Data Requirements Choosing the Lowest Level of Detail Requires disk space Involves more process time Provides detailed data analysis capability Conforming Measures to the Stated Grain Design Considerations
13
Establishing Dimensions Defining Dimension Characteristics Identifying Dimension Hierarchies Defining Conventional Dimensions Sharing Dimensions Among Other Data Marts Defining Other Types of Dimensions
14
Defining Dimension Characteristics Applying Characteristics to Dimension Tables Define a primary key Include highly correlated and descriptive character columns Designing for Usability and Extensibility Minimize or avoid using codes or abbreviations Create columns that are useful for levels of aggregation Avoid missing or null values Minimize the number of rows that change over time
15
Identifying Dimension Hierarchies Consolidated Hierarchy Store Location ContinentContinentCountryCountryRegionRegionCityCityStoreStore Separate Hierarchy Store Location Continent ContinentContinent Country CountryCountry Region RegionRegion City CityCity Store StoreStore 01
16
Defining Conventional Dimensions Time Dimension Break time down into individual attributes Represent time as work days, weekends, holidays, seasons, or fiscal periods Is limited to the grain of the fact table Geographic Dimension Product Dimension Customer Dimension
17
Sharing Dimensions Among Other Data Marts One instance exist and is shared among data marts TimeTime Multiple instances exist in individual data marts Sales Production Purchasing
18
Defining Other Types of Dimensions Defining Degenerate Dimensions Useful for consolidated reporting at the business event level Defining Junk Dimensions Useful for capturing important information without increasing the size of the fact table
19
Establishing a Fact Table Defining the Fact Table Defining Precalculations Minimizing Fact Table Size Balancing Size and Performance
20
Defining the Fact Table Applying the Grain Ensuring Consistency Between Measures Using Additive and Numeric Values Summarizing Data
21
Defining Precalculations Single Row Precalculations Fact Table Values are derived from measures within that row Time Keys 7 7 13 25 7 7 Product Keys 2 2 30 8 8 5 5 PricePrice 20.00 ~ ~ ~ ~ 10.00 DiscountDiscount.10 ~ ~ ~ ~ RebateRebate 5.00 ~ ~ ~ ~ Extended Price 13.00 ~ ~ ~ ~ 4.00... Multiple Row Precalculations Fact Table Time Keys 7 7 10 ~ ~... Product Keys 2 2 2 2 ~ ~... Year-to-date sales 20,000.00 25,000.00 ~ ~... Values are derived from multiple rows ((Price - (Price X Discount)) - Rebate) = Extended Price SUM(Extended Price) = Year-to-date Sales
22
Minimizing Fact Table Size Reducing the Number of Columns Data is redundant Data is not required for analysis Reducing the Size of Each Column Use surrogate keys Ensure that character and binary data is variable length
23
Balancing Size and Performance Storing Large Fact Tables Designing Star Schema Fact tables—long and narrow Dimension tables—short and wide Including Precalculated Data Improves query performance but increases the size of a fact table Moving Fact Table Columns to Another Table Reduces fact table size but may affect query performance
24
Lab A: Designing a Star Schema
25
Implementing a Star Schema Estimating Size of the Data Warehouse Creating a Database Creating Tables Creating Constraints Creating Indexes
26
Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 DescriptionDescription Number of rows in fact table Estimated row size of fact table Estimated data warehouse size Calculation Method 10,000 x 4 x 365 x 5 (7 IDs x 4 bytes) + (5 measures x 4 bytes) 48 bytes x 73,000,000 rows ValueValue 73,000,000 ~ 48 bytes ~3.5 GB Estimating Size of the Data Warehouse Size of Fact Table Grain Bytes per Row
27
Creating a Database Using CREATE DATABASE Options SIZE MAXSIZE FILEGROWTH Setting Database Options Trunc. log on chkpt. SELECT INTO/Bulkcopy
28
Creating Tables Creating a Table Specifying NULL or NOT NULL Generating Column Values
29
Creating Constraints Using PRIMARY KEY Constraints Does not allow duplicate values Allows index to be created Does not allow null values Using FOREIGN KEY Constraints Is the multipart key stored in the fact table Defines a reference to a column with a PRIMARY KEY or UNIQUE constraint Specifies the data values that are acceptable to update
30
Creating Indexes Steps for Creating Data Warehouse Indexes Define primary key in dimension tables Declare foreign key relationships Define primary key in fact table Define indexes on each foreign key in fact table Using Surrogate Keys Using Clustered Indexes Using Nonclustered Indexes Creating Composite Indexes1122 33 44
31
Recommended Practices Use Star Schema to Model Data Mart or Data Warehouse Database Do Not Mix Grain in Individual Fact Table Attributes Use Single Element Surrogate Keys When Defining Dimensions Define Shared Dimensions Use Facts That Are Both Numeric and Additive Choose Grain
32
Lab B: Implementing a Star Schema
33
Review Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.