Download presentation
Presentation is loading. Please wait.
Published byArron Elliott Modified over 9 years ago
1
Transportation: Refreshing Warehouse Data Chapter 13
2
Developing a Refresh Strategy for Capturing Changed Data Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how to detect changes Operational databases T1T2T3
3
User Requirements and Assistance Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills Operational databases T1T2T3
4
Load Window Time available for entire ETT process Plan Test Prove Monitor Load Window User Access Period Load Window 03am6912pm36912
5
Load Window Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first High availability requirements may mean a small load window User Access Period 03am6912pm36912
6
Scheduling the Load Window ¬ Requirements Ë Load cycle File Names File types Number of files Number of loads First-time load or refresh Date of file Data range Records in file - counts Totals - amounts 3 4 Control File File 1 File 2 FTP Receive data Open and read files to verify and analyze Control process 03 am
7
Scheduling the Load Window Load into warehouse Verify, analyze, reapply Index data Create summaries Update metadata 5 6 7 8 9 File 1 File 2 Parallel load 9 am3 am 6 am
8
Scheduling the Load Window Back up warehouse Create Views for Specialized tools Users Access Summary data Publish 10 11 12 13 9 am 6 am User access
9
Capturing Changed Data for Refresh Capture new fact data Capture changed dimension data Determine method for capture of each Methods: - Wholesale data replacement - Comparison of database instances - Time stamping - Database triggers - Database log Hybird techniques
10
Wholesale Data Replacement Expensive Limited historical data, if any Data mart implementations Time period replacement Operational databases T1T2T3
11
Comparison of Database Instance Simple to perform, but expensive in time and processing Data file: - Changes to operational data since last refresh - Used by various techniques Yesterday’s Operational database Today’s Operational database Database comparison Delta file holds Changed data
12
Time and Date Stamping Fast scanning for records changed since last extraction Date Updated field No detection of deleted data Operational data Delta file holds Changed data
13
Database Triggers Changed data intersected at the server level Extra I/O required Maintenance overhead Operation Server (DBMS) Trigger
14
Using a Database Log Contains before and after images Requires system checkpoint Common technique Operational Server (DBMS) Log analysis And Data extraction Log Operational data Delta file holds Changed data
15
Verdict Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues.
16
Applying the Changes to Data You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers
17
Overwriting a Record Easy to implement Loses all history Not recommended Customer ID John Doe Single Customer ID John Doe Married
18
Adding a New Record History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys. 1 Customer Id John Doe Single 1A Customer Id John Doe Married
19
Adding a Current Field Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field Customer Id John Doe Single Customer Id John Doe Single Married 01-JAN-96
20
Limitations of Methods for Applying Changes Complete history impossible Dimensions may grow large Maintenance overload 1234 Comer 1 Main Street 555-6789 1234 Comer 200 First Ave 222-3211 1234 Comer 1 Main Street 555-6789 1234 Comer 1 Main Street 555-6789 01-Apr-93 1234-01 Comer 200 First Ave 222-3211 Effective Date 1234-01 Comer 200 First Ave 222-3212 01-Jun-97
21
Maintaining History One-to-many relationship Always retain current record Consistently able to refer to record history HIST_CUST CUSTOMER Sales Time Product
22
History Preserved History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. - Reflect business changes - Maintain context between fact and dimension data - Retain sufficient data to relate old to new
23
Version Numbering Avoid double counting Facts hold version number Customer.CustId Version Customer Names 1234 1 Comer 1234 2 Comer Customer.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000 Customer Sales Time Product
24
Purging and Archiving Data As data ages, its value depreciates. Remove old data from the warehouse: - Archive for later use - Purge without copy
25
Techniques for Purging Data TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers
26
Techniques for Archiving Data Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions Database EXP IMP.dmp
27
Verdict Defined by business requirements Must be managed
28
Final Tasks Update metadata - ETT - User Publish data - Availability - Changes - Subject area basis Use database roles to prevent and allow access
29
Publishing Data Control access using database roles 24-hour operation may be requested Compromise between load and access Consider - Staggering updates - Using temporary tables - Using separate tables
30
ETT Tool Selection Criteria Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code
31
ETT Tool Selection Criteria Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operation systems Cost
32
Transportation Tools Information OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info Transport
33
Replication Server Utilities Oracle Symmetric and Heterogeneous Replication
34
Gateways and Middleware Brio Technology DataPrism Information Co. OpenBridge Information Builders EDA/SQL Oracle Gateways Platinum Technology InfoHub Prism Prism Manager Software AG Entire Transaction Propagator
35
Summary This lesson discussed the following topics: Capturing changed data Applying the changes Purging and archiving data Publishing the data, controlling access, and automating processes Identifying tools for transporting data into the warehouse
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.