Download presentation
Presentation is loading. Please wait.
Published byMitchell Flynn Modified over 9 years ago
1
www.semantec.de Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D-71083 Herrenberg
2
www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations
3
www.semantec.de Agenda ETL (Extraction, Transformation, Transportation and Loading) –Transportable Tablespaces –External Tables –Table Functions –MERGE Statement Data Warehouse Management Data Warehouse Querying Parallel Operations
4
www.semantec.de Transportable tablespaces The fastest method for moving data between databases The tablespeces with all their data are plugged into the data warehouse database ProductionData Warehouse Tablespace ftp
5
www.semantec.de External Tables Can be directly queried and joined in SQL, PL/SQL and Java Avoid data staging One step loading and transformation Save DB space ASCII file Excel sheet Read-only virtual tables External files
6
www.semantec.de Table Functions Can take a set of rows as input Can return a set of rows as output Can be used in the FROM clause Can be paralellized Can be pipelined User defined in PL/SQL, Java or C Region% West Central East 30 50 20 Sales Table Function
7
www.semantec.de Table Functions Pipelining Data Transformation Table Function Table Function Source Target Step 1Step 2 Log table
8
www.semantec.de MERGE statement idamount 43000 81000 92000 idamount 45000 73000 86000 92000 UPDATE INSERT new_salessales MERGE INTO sales s USING new_sales n ON (s.id = n.id) WHEN MATCHED THEN UPDATE s.amount = s.amount + n.amount WHEN NOT MATCHED THEN INSERT (s.id, s.amount) VALUES (n.id, n.amount) idamount 42000 73000 85000
9
www.semantec.de MERGE Advantages Single simple SQL statement Can be paralellized Can use Bulk DML Fewer scans of the base table
10
www.semantec.de More ETL Features Direct-path Interface –SQL*Loader –CREATE AS SELECT –INSERT –Oracle Call Interface Multi-table INSERTs
11
www.semantec.de Agenda ETL Features Data Warehouse Management –Partitioning –Materialized Views –DBMS_STATS Data Warehouse Querying Parallel Operations
12
www.semantec.de Partitioning Jan‘2002 Tablespace 0102 Feb‘2002 Tablespace 0202 Dec‘2002 Tablespace 1202... Table Sales
13
www.semantec.de Advantages of Partitioning Partition independance –LOAD, MOVE, Purge and DROP partitions –MERGE, SPLIT, EXCHANGE partitions –BACKUP, RESTORE, SET READ ONLY Partition elimination –SELECT or JOIN only the partition needed Parallel Operations –SELECT, UPDATE, DELETE, MERGE
14
www.semantec.de Partitioning Methods Hash Partitioning –Even row distribution by hash function Range Patitioning –<01.01.2002 | <01.02.2002 |... | <01.01.2003 List Partitioning –Stuttgart, Munich | Manheim, Frankfurt |...
15
www.semantec.de Table Compression Stores tables or partitions in compressed format Reduces disk space requirements Reduces memory requirements Speeds up query execution Speeds up backup and recovery Very efficient for highly redundant data – the FACT table 2 to 4 times compression is usual
16
www.semantec.de Materialized Views revenue_sum regionmonthrevenue sales regionmonthinvc_sum... SELECT region, month, sum(invc_sum) revenue FROM sales GROUP BY region, month
17
www.semantec.de Advantages of Materialized Views Improved query/reporting performance for: –Summaries –Agregates –Joins Fast Refresh –Data change tracking –Partition change tracking No application change needed – their usage is automatic
18
www.semantec.de DBMS_STATS New package for gathering table and index statistics Gathers statistics in parallel Can export and import statistics Production Data Warehouse Development Data Warehouse Statistics
19
www.semantec.de More Data Warehouse Management Features Index-organized tables Online index rebuild Online table rebuild
20
www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying –Bitmap Indexing –Star Query Transformation –Agregation – ROLLUP, CUBE, Grouping Sets –Analytic functions Parallel Operations
21
www.semantec.de Bitmap Indexes RegioneastcentralwestNULL rowid1000 0010...0001 rowid0100 1 0 0 0 0 1 0 0 0 0 1 0 ORAND NOT () = 1 1 0 0
22
www.semantec.de Advantages of Bitmap Indexes Reduced response time for ad-hoq queries Uses much less space than a B-tree index Dramatic performance gains for large class of queries: –Multiple AND, OR and NOT conditions –IS NULL conditions –COUNT –NOT IN - Bitmap MINUS –BETWEEN - Bitmap UNION
23
www.semantec.de Star Query Transformation The query is re-written for efficient execution sales cust_idprod_idamount q_id cust_idnameprod_idnameq_idname customersproductsquarters Steps: 1.Filter all dimentions 2.Combine the bitmap indexes of the fact table‘s foreign keys 3.Retrieve fact and dimention other rows
24
www.semantec.de Agregation Operators Oracle extends the GROUP BY clause by: –ROLLUP –CUBE –Grouping Sets 25008000 4000 6500 10500 SELECT SUM(amount) FROM sales GROUP BY county, quarter Q1 Q2 UKUS 10003000 15005000
25
www.semantec.de ROLLUP and CUBE ROLLUP(country, department, quarter) (country, department, quarter) (country, department) (country) () - Grand Total CUBE(country, department, quarter) (country, department, quarter) (country, department) (country, quarter) (department, quarter) (country) (department) (quarter) () - Grand Total ROLLUP – subtotals at increasing levels of agregation – from right to left CUBE – subtotals on all combinations n+1 2n2n
26
www.semantec.de Agregation Operators Advantages Applicable on many agregation functions: –SUM, AVG, COUNT –MIN, MAX –STDDEV, VARIANCE Flexible agregation groups and levels Runs in parallel
27
www.semantec.de Analytic functions Significantly improved performance for complex reports as: –Ranking – Find top 10 sales in each region –Moving agregates – What is the 90 day moving sales average? –Period-over-period comparison – What are the revenues from January 2002 compared to January 2001?
28
www.semantec.de Example – Moving Window SELECT c.cust_id, t.month, SUM(amount_sold) SALES, AVG(SUM(amount_sold)) OVER (ORDER BY c.cust_id, t.month ROWS 2 PRECEDING) MOV_3_MONTH FROM sales s, times t, customers c WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t. year = 1999 AND c.cust_id IN (6380) GROUP BY c.cust_id, t.month ORDER BY c.cust_id, t.month; CUST_ID MONTH SALES MOV_3_MONTH ------- ------- ------- ----------- 6380 1999-01 19,642 19,642 6380 1999-02 19,324 19,483 6380 1999-03 21,655 20,207 6380 1999-04 27,091 22,690 6380 1999-05 16,367 21,704 6380 1999-06 24,755 22,738
29
www.semantec.de More Data Warehouse Querying Features Function-based Indexes Optimizer Plan Stability Statistics for Long Running Operations Resumable Statements Full Outer Join With Operator Oracle Text “Advanced Searching with Oracle Text” 14.11.2002, 2 nd Conference day 11:50-12:30, Konferenzraum EG
30
www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations
31
www.semantec.de Parallel Operations Dramatically reduce execution time of data intensive operations Loading –Direct Path Load DDL Statements –CREATE AS SELECT, CREATE INDEX –REBUILD INDEX, REBUILD INDEX PARTITION –MOVE, SPLIT, COALESCE PARTITION DML Statements –INSERT AS SELECT –UPDATE, DELETE and MERGE
32
www.semantec.de Parallel Operations Access methods –Table and index range and full scans Join methods –Nested loops, Sort merge, Hash, Star transformation SQL operations –GROUP BY, ROLLUP, CUBE –DISTINCT, UNION, UNION ALL –Agregate functions
33
www.semantec.de Parallel System Requirements Symetric Multiprocessor Systems, Clusters or Massively Parallel Systems Sufficient I/O Bandwidth Sufficient (Underutilized) CPUs Sufficient Memory
34
www.semantec.de Summary Effective handling of multi-terabyte Data Warehouses Rich feature set for all Data Warehouse operations Flexible agregation and analytical features for high performance queries Effective parallelizm
35
www.semantec.de Want to know more? Telephone: Fax: E-Mail: Internet: Company: Name: Address: Semantec GmbH. Krasen Paskalev, Armin Singer, Peter Kopecki Benzstr. 32 D-71083 Herrenberg, Germany Meet us here -> booth 2C at the ground floor +49(7032)9130-0 +49(7032)9130-12 +49(7032)9130-22 krasen.paskalev@semantec.bg singer@semantec.de www.semantec.de
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.