Performance Tuning ETL Process Mark McNeely
Test your self “Matching Game”
Component Matching Answers
Source Systems Source Systems Extract E-Business Suite R12 PeopleSoft Enterprise Siebel CRM JD Edwards Extract Staging Transformation Delivery End-User
DAC ETL Scheduler
Source System Stats What – gathers important information such as read times for single and multiple block reads, cpu speed, and other system throughputs. Why – Before a query is executed the optimizer calculates the cost of the query. Without Stats full-table scans and index-scans are evaluated as equivalent. Remember to gather stats when the system is busy to get accurate information.
SQL Trace files SQL Trace Files do: Parse, execute, and fetch counts CPU and elapsed times Physical reads and logical reads Number of rows processed Misses on the library cache Username under which each parse occurred Each commit and rollback
TKPROF You can run the TKPROF program to format the contents of the trace file and place the output into a readable output file.
Explain Plan Explain Plan shows the sequence of operations performed in a SQL Query. It tells you how tables are joined and the indexes used.
SDE vs. SIL tasks
DAC Details
Informatica Workflow Manager
ETL Run
Informatica Workflow Monitor
Informatica Session Log
Session Log usage Busy % = (Total Run Time – Total Idle Time) / Total Run Time If Busy % (> 70 – 80%) for Reader Thread then review the Source Qualifier If Busy % (>60 – 70 %) for the TRANSF Thread then review the transformation If Busy % high for the WRITER Thread then review the Bulk Mode.
Hash Joins vs. Nested Loops Optimizer chooses Nested Loops because they have less cost. Nested loops do bring the initial rows back quicker but for large volumes of over 10 million use a USE_HASH hint to cause the optimizer to use a hash join. I’ve shaved a couple of hours off of a poor performer.
Partitioning Guidelines for large tables More than 20 million rows. Find a reasonable partition for example year. Couple of advantages: improved query performance and quicker ETL loads.
Source System Extract Staging Transformation Delivery End-User