Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D Herrenberg
Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations
Agenda ETL (Extraction, Transformation, Transportation and Loading) –Transportable Tablespaces –External Tables –Table Functions –MERGE Statement Data Warehouse Management Data Warehouse Querying Parallel Operations
Transportable tablespaces The fastest method for moving data between databases The tablespeces with all their data are plugged into the data warehouse database ProductionData Warehouse Tablespace ftp
External Tables Can be directly queried and joined in SQL, PL/SQL and Java Avoid data staging One step loading and transformation Save DB space ASCII file Excel sheet Read-only virtual tables External files
Table Functions Can take a set of rows as input Can return a set of rows as output Can be used in the FROM clause Can be paralellized Can be pipelined User defined in PL/SQL, Java or C Region% West Central East Sales Table Function
Table Functions Pipelining Data Transformation Table Function Table Function Source Target Step 1Step 2 Log table
MERGE statement idamount idamount UPDATE INSERT new_salessales MERGE INTO sales s USING new_sales n ON (s.id = n.id) WHEN MATCHED THEN UPDATE s.amount = s.amount + n.amount WHEN NOT MATCHED THEN INSERT (s.id, s.amount) VALUES (n.id, n.amount) idamount
MERGE Advantages Single simple SQL statement Can be paralellized Can use Bulk DML Fewer scans of the base table
More ETL Features Direct-path Interface –SQL*Loader –CREATE AS SELECT –INSERT –Oracle Call Interface Multi-table INSERTs
Agenda ETL Features Data Warehouse Management –Partitioning –Materialized Views –DBMS_STATS Data Warehouse Querying Parallel Operations
Partitioning Jan‘2002 Tablespace 0102 Feb‘2002 Tablespace 0202 Dec‘2002 Tablespace Table Sales
Advantages of Partitioning Partition independance –LOAD, MOVE, Purge and DROP partitions –MERGE, SPLIT, EXCHANGE partitions –BACKUP, RESTORE, SET READ ONLY Partition elimination –SELECT or JOIN only the partition needed Parallel Operations –SELECT, UPDATE, DELETE, MERGE
Partitioning Methods Hash Partitioning –Even row distribution by hash function Range Patitioning –< | < |... | < List Partitioning –Stuttgart, Munich | Manheim, Frankfurt |...
Table Compression Stores tables or partitions in compressed format Reduces disk space requirements Reduces memory requirements Speeds up query execution Speeds up backup and recovery Very efficient for highly redundant data – the FACT table 2 to 4 times compression is usual
Materialized Views revenue_sum regionmonthrevenue sales regionmonthinvc_sum... SELECT region, month, sum(invc_sum) revenue FROM sales GROUP BY region, month
Advantages of Materialized Views Improved query/reporting performance for: –Summaries –Agregates –Joins Fast Refresh –Data change tracking –Partition change tracking No application change needed – their usage is automatic
DBMS_STATS New package for gathering table and index statistics Gathers statistics in parallel Can export and import statistics Production Data Warehouse Development Data Warehouse Statistics
More Data Warehouse Management Features Index-organized tables Online index rebuild Online table rebuild
Agenda ETL Features Data Warehouse Management Data Warehouse Querying –Bitmap Indexing –Star Query Transformation –Agregation – ROLLUP, CUBE, Grouping Sets –Analytic functions Parallel Operations
Bitmap Indexes RegioneastcentralwestNULL rowid rowid ORAND NOT () =
Advantages of Bitmap Indexes Reduced response time for ad-hoq queries Uses much less space than a B-tree index Dramatic performance gains for large class of queries: –Multiple AND, OR and NOT conditions –IS NULL conditions –COUNT –NOT IN - Bitmap MINUS –BETWEEN - Bitmap UNION
Star Query Transformation The query is re-written for efficient execution sales cust_idprod_idamount q_id cust_idnameprod_idnameq_idname customersproductsquarters Steps: 1.Filter all dimentions 2.Combine the bitmap indexes of the fact table‘s foreign keys 3.Retrieve fact and dimention other rows
Agregation Operators Oracle extends the GROUP BY clause by: –ROLLUP –CUBE –Grouping Sets SELECT SUM(amount) FROM sales GROUP BY county, quarter Q1 Q2 UKUS
ROLLUP and CUBE ROLLUP(country, department, quarter) (country, department, quarter) (country, department) (country) () - Grand Total CUBE(country, department, quarter) (country, department, quarter) (country, department) (country, quarter) (department, quarter) (country) (department) (quarter) () - Grand Total ROLLUP – subtotals at increasing levels of agregation – from right to left CUBE – subtotals on all combinations n+1 2n2n
Agregation Operators Advantages Applicable on many agregation functions: –SUM, AVG, COUNT –MIN, MAX –STDDEV, VARIANCE Flexible agregation groups and levels Runs in parallel
Analytic functions Significantly improved performance for complex reports as: –Ranking – Find top 10 sales in each region –Moving agregates – What is the 90 day moving sales average? –Period-over-period comparison – What are the revenues from January 2002 compared to January 2001?
Example – Moving Window SELECT c.cust_id, t.month, SUM(amount_sold) SALES, AVG(SUM(amount_sold)) OVER (ORDER BY c.cust_id, t.month ROWS 2 PRECEDING) MOV_3_MONTH FROM sales s, times t, customers c WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t. year = 1999 AND c.cust_id IN (6380) GROUP BY c.cust_id, t.month ORDER BY c.cust_id, t.month; CUST_ID MONTH SALES MOV_3_MONTH ,642 19, ,324 19, ,655 20, ,091 22, ,367 21, ,755 22,738
More Data Warehouse Querying Features Function-based Indexes Optimizer Plan Stability Statistics for Long Running Operations Resumable Statements Full Outer Join With Operator Oracle Text “Advanced Searching with Oracle Text” , 2 nd Conference day 11:50-12:30, Konferenzraum EG
Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations
Parallel Operations Dramatically reduce execution time of data intensive operations Loading –Direct Path Load DDL Statements –CREATE AS SELECT, CREATE INDEX –REBUILD INDEX, REBUILD INDEX PARTITION –MOVE, SPLIT, COALESCE PARTITION DML Statements –INSERT AS SELECT –UPDATE, DELETE and MERGE
Parallel Operations Access methods –Table and index range and full scans Join methods –Nested loops, Sort merge, Hash, Star transformation SQL operations –GROUP BY, ROLLUP, CUBE –DISTINCT, UNION, UNION ALL –Agregate functions
Parallel System Requirements Symetric Multiprocessor Systems, Clusters or Massively Parallel Systems Sufficient I/O Bandwidth Sufficient (Underutilized) CPUs Sufficient Memory
Summary Effective handling of multi-terabyte Data Warehouses Rich feature set for all Data Warehouse operations Flexible agregation and analytical features for high performance queries Effective parallelizm
Want to know more? Telephone: Fax: Internet: Company: Name: Address: Semantec GmbH. Krasen Paskalev, Armin Singer, Peter Kopecki Benzstr. 32 D Herrenberg, Germany Meet us here -> booth 2C at the ground floor +49(7032) (7032) (7032)