Populating Data Warehouse Structures
Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema
Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables
The Star Schema Data Load NorthwindOLTP Staging Area Polaris Data Warehouse Heterogeneous Data Sources ExternalFiles ExternalFiles InternalFiles InventoryStar SalesStar Extracting Data From Transforming Loading the Heterogeneous Sources Data Star Schema DTS Financial
Verifying the Dimension Source Data Verifying Accuracy of Source Data Integrating data from multiple sources Applying business rules Checking structural requirements Managing Invalid Data Rejecting invalid data Saving invalid data to a log Correcting Invalid Data Transforming data Reassigning data values
Dimension Data Load Examples:buyer_namebuyer_name Barr, Adam Chai, Sean OMelia, Erin... reg_idreg_id buyer_firstbuyer_first Adam Sean Erin... buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id DTS buyer_codebuyer_code A123 B buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id buyer_codebuyer_code U999 A123 B buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id buyer_namebuyer_name Barr, Adam Chai, Sean Smith, Jane Paper, Anne reg_idreg_id DTS buyer_namebuyer_name Barr, Adam Chai, Sean reg_idreg_id II IV buyer_namebuyer_name Smith, Jane Paper, Anne reg_idreg_id
Maintaining Integrity of the Dimension Assigning a Surrogate Key to Each Record Defines the dimensions primary key Relates to the foreign key fields of the fact table Loading One Record Per Application Key Maintains uniqueness in the dimension Depends on how you manage changing dimension data Maintains integrity of the fact table
Managing Changing Dimension Data Dimensions with Changing Column Values Inserts of new data Updates of existing data Slowly-Changing Dimension Design Solutions Type 1: Overwrite the dimension record Type 2: Write another dimension record Type 3: Add attributes to the dimension record
Type 1: Overwriting the Dimension Slide Existing record is changed product key product name product size product package product dept product cat product subcat... product key product name product size product package product dept product cat product subcat... Product Dimension 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks... Before After 001 Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks oz.
Type 2: Writing Another Dimension Record Adds a new record product key product name product size product package product dept product cat product subcat effective_date … product key product name product size product package product dept product cat product subcat effective_date … Product Dimension 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks Before After 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks oz. 12 oz. Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks
Type 3: Adding Attributes in the Dimension Record Additional information is stored in an existing record Product Dimension product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date... product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date... product size previous product size previous product size date Before 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks Oz (null) Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks Oz (null) oz. 11 oz After 001 Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks oz Oz Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks oz Oz oz oz
Verifying the Fact Table Source Data Verifying Accuracy of Source Data Integrating data from multiple sources Applying business rules Checking structural requirements Managing Invalid Data Rejecting invalid data Saving invalid data to a log Correcting Invalid Data Transforming data Reassigning data values
Assigning Foreign Keys Dimension Tables customer_dimcustomer_dim 201 ALFI Alfreds product_dimproduct_dim Chai Source Data customer id ALFI1231/1/ /1/2000 time_dimtime_dim product id order date quantity_sales amount_sales 10, /1/ ,789 cust_key 1231/1/ prod_key time_key quantity_sales amount_sales , Sales Fact Data
Defining Measures Loading Measures from the Source System Calculating Additional Measures Source System Data Fact Table Datacustomer_idcustomer_id VINET ALFI HANAR... product_idproduct_id 9GZ 1KJ 0ZA... priceprice qtyqty customer_keycustomer_key product_keyproduct_key qtyqty total_salestotal_sales
Maintaining Data Integrity Adhering to the Fact Table Grain A fact table can only have one grain You must load a fact table with data at the same level of detail as defined by the grain Enforcing Column Constraints NOT NULL constraints FOREIGN KEY constraints
Implementing Staging Tables Centralize and Integrate Source Data Break Up Complex Data Transformations Facilitate Error Recovery Staging Area sales_stage inventory_stage market_stage shipments_stage
DTS Functionality Accessing Heterogeneous Data Sources Importing, Exporting, and Transforming Data Creating Reusable Transformations and Functions Automating Data Loads Managing Metadata Customizing and Extending Functionality
Defining DTS Packages Identifies Data Sources and Destinations Defines Tasks or Actions Implements Transformation Logic Defines Order of Operations
Identifying Package Components Connections Access Data Sources and Destinations Tasks Describe Data Transformations or Functions Steps Define the Order of Task Operations or Workflow Global Variables Store Data that Can Be Shared Across Tasks
Creating Packages Using the DTS Import / Export Wizard Perform ad-hoc table and data transfers Develop a prototype package Using DTS Package Designer Edit packages created with the DTS Import/Export Wizard Create packages with a wide range of functionality Programming DTS Applications Directly access the functionality of the DTS Object Model Requires Microsoft Visual Basic or Microsoft Visual C++
Using DTS to Populate the Sales Star Populating the Sales Star Dimensions Populating the Sales Star Fact Table
Populating the Sales Star Dimensions Product Tab Delimited Files NorthwindOLTP DTS time_dim customer_dim product_dim SQL Server Stored Procedure DTS
Populating the Sales Star Fact Table DTS sales_fact DTS sales_stage time_dimcustomer_dim product_dimsales_stage Sales Data File
Designing Modular Packages Creating Modular Packages Simplify complex workflows Create more readable packages Produce smaller packages that are easier to debug Using Outer Packages Execute multiple packages within a single package Combine modular packages into logical workflows Reuse modular packages in different workflows Execute packages in parallel
Using DTS to Populate the Sales Star