Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.

Similar presentations


Presentation on theme: "ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through."— Presentation transcript:

1 ETL Extract

2 Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through of data lineage and business rules Receive walk-through of data warehouse model Validate calculations and formulas

3 Logical Data Map Used to collect and document source systems to be used for DW Should contain the following: –Target table name –Target column name –Table type –SCD type –Source db –Source table name –Source column name –Transformation

4 Track Volume Volume worksheet –Staging table name –Update Strategy –Load frequency –ETL jobs/ programs –Initial row count –Avg. Row length –Grows with –Expected monthly rows –Expected monthly bytes –Initial table size bytes –Table size 6 mo.

5 Source System Tracking Used to document source systems and who is responsible for them Should be maintained, not a 1 time effort May also serve as reference for future phases of the DW

6 Source System Tracking Should contain the following –Subject area –Interface name –Business name –Priority –Department/ business use –Business owner –Technical owner –DBMS –Production server/ OS –# of daily users –DB size –DB complexity –# transactions per day –Comments

7 System of Record Originating source of data As much as possible, extract only from system-of-record The farther away from the system-of- record, the higher the risk that the data is corrupted

8 Data Profiling: Source System Analysis Reengineer ERD of source system Focus on the following: –Primary keys –Data types –Relationships –Cardinalities

9 Data Profiling: Data Content Analysis Nulls –Null is not equal to Null –Data loss Dates –Different formatting

10 Business Rules Dimensional model –STATUS CODE: 4 digit code that uniquely identifies the status of the product. It has a short description (usually 1 word), and a long description (usually 1 sentence) ETL –STATUS CODE: 4 digit code, however some existing legacy codes have 3 digits that are still being used. These have to be converted to 4 digit codes. If name of the code has “OBSOLETE”, it needs to be removed and the obsolete flag set to “Y”

11 Heterogeneous Sources Sources may be of the following formats/ platforms: –ODBC –Mainframes (EBDIC, ASCII) –Flat files (delimited, fixed length) –XML –Web logs –ERP systems

12 Extracting Changed Data Initial vs. Incremental –Initial: loading all data from pre- determined point in time –Incremental: loading changes to data

13 Detecting Changes DB audit columns and tables DB log scraping or sniffing Timed Extracts Elimination


Download ppt "ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through."

Similar presentations


Ads by Google