Download presentation
Presentation is loading. Please wait.
Published byRandolf Benson Modified over 9 years ago
1
Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz@databaseperformance.com Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation
2
Mike Schmitz High Performance Data Warehousing 2 Agenda The meaning of Real-Time in Data Warehousing Customer Business Scenario Customer Environment “Real-Time” Requirement Our Real-Time Solution Real-Time data architecture Incremental Operational Source Change Capture Transformation and Population into DW Target Simplified Functional Demonstration Asynchronous Change Data Capture (Oracle) Performance Characteristics and Considerations
3
Mike Schmitz High Performance Data Warehousing 3 My Background An independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or small- scale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U.S. and Europe. I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF.
4
Mike Schmitz High Performance Data Warehousing 4 Real-Time in Data Warehousing Data Warehousing Systems are complex environments Business rules Various data process flows and dependencies Almost never pure Real-Time Some latency is a given What do you need? Real Time Near Real-Time Just in Time for the business
5
Mike Schmitz High Performance Data Warehousing 5 Customer Business Scenario Client provides software solutions for utility companies Utility companies have plants generating energy supply Recommended maximum output capacity Reserve Capacity Buy supplemental energy as needed Peak demand periods are somewhat predictable Each day is pre-planned on historical behavior Cheaper to buy energy ahead Expensive to have unused capacity Existing data warehouse supports the planning function Reduced option expenses Cut down of supplemental energy costs
6
Mike Schmitz High Performance Data Warehousing 6 Customer “Real-Time” Requirement Getting more in-time accuracy enhances operational business Compare today's plant output volumes to yesterdays or last week’s average Know when to purchase additional options or supplies Customer Target Actual data within a 5 minute lag Use a single query Use a single tool
7
Mike Schmitz High Performance Data Warehousing 7 Sample Analysis Graph
8
Mike Schmitz High Performance Data Warehousing 8 Our Real-Time Solution Overview Three-Step Approach: 1. Implement a real-time DW data architecture 2. Near real-time incremental change capture from operational system 3. Transformation and Propagation (population) of change data to DW
9
Mike Schmitz High Performance Data Warehousing 9 Our Real-Time Solution Real-Time DW Data Architecture Add a Real-Time “Partition” to our Plant Output Fact Table for current day activity Separate physical table No indexes or RFI constraints (data coming in will have RFI enforced) during daily activity UNION ALL viewed to the Plant Output Fact Table
10
Mike Schmitz High Performance Data Warehousing 10 Our Real-Time Solution Change Capture and Population 1. Incremental change capture from operational site Synchronous or Asynchronous 2. Transformation and Propagation (population) of change data to the DW Continuous trickle feed or periodic batch Operations StagingDW Asynch CDC Trigger Batch Synch CDC
11
Mike Schmitz High Performance Data Warehousing 11 Our Real-Time Solution Incremental Change Capture Done with Oracle’s Change Data Capture (CDC) functionality Synchronous CDC available with Oracle9 i Asynchronous CDC with Oracle10 g Asynchronous CDC is the preferred mechanism Decoupling of change capture from the operational transaction
12
Mike Schmitz High Performance Data Warehousing Asynchronous CDC SQL interface to change data Publish/subscribe paradigm Parallel access to log files, leveraging Oracle Streams Parallel transformation of data OLTP DB Redo log files Logical Change Data Based on Log Miner Oracle10g DW Tables SQL, PL/SQL, Java Transform
13
Mike Schmitz High Performance Data Warehousing 13 Our Real-Time Solution Population of Change Data into DW Continuous Change table owner creates trigger to populate warehouse real-time partition Periodic Batch Utilize the Subscribe Interface Subscribe to specific table and column changes through view Sets a window and extracts the changes at required period Purges view and moves window
14
Mike Schmitz High Performance Data Warehousing 14 Integrate daily changes into historical fact table At the end of the day index the current day table and apply constraints (no validate) Create new fact table partition Exchange current day table with new partition Create next days “Real-Time Partition” table Our Real-Time Solution The Daily Process
15
Mike Schmitz High Performance Data Warehousing 15 Simplified Functional Demo Schema Owners AO_CDC_OP Owns the operational schema AO_CDC Owns the CDC change sets and change tables (needs special cdc privileges) ? CDC Publish Role AO_CDC_DW Owns the data warehouse schema (also needs special cdc privileges) ? CDC Subscribe Role
16
Simplified Functional Demo Operational Schema
17
Simplified Functional Demo Data Warehouse Schema
18
Mike Schmitz High Performance Data Warehousing 18 What do we have? Operational transaction table AO_CDC_OP.PLANT_OUTPUT DW historical partitioned fact table AO_CDC_DW.F_PLANT_OUTPUT DW current day table (“Real-Time Partition”) AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT Data Warehouse UNION ALL view AO_CDC_DW.V_PLANT_OUTPUT
19
Mike Schmitz High Performance Data Warehousing 19 First The CDC user publishes Create a Change Set (CDC_DW) Add supplemental logging for the operational table Create a change table for the operational table (CT_PLANT_OUTPUT) Force database logging on the tablespace to catch any bulk insert /*+ APPEND */ (non- logged) activity
20
Mike Schmitz High Performance Data Warehousing 20 Next – Transform and Populate One of two ways Continuous Feed Logged Insert activity Permits nearer real-time Constant system load Periodic Batch Feed Permits non-logged bulk operations You set the lag time – how often do you run the batch process? Hourly Every five minutes Less system load overall
21
Mike Schmitz High Performance Data Warehousing 21 The Continuous Feed Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations
22
Mike Schmitz High Performance Data Warehousing 22 The Batch Feed The CDC schema owner Authorizes AO_CDC_DW to select from the change table (the select will be accomplished via a generated view) The DW schema owner Subscribes to the change table and the columns he needs (with a centralized EDW approach this would usually be the whole change table) with a subscription and view name Activates the subscription Extract Extend the window Extracts changed data via the view (same code as trigger) Purges the window (logical Delete – physical deletion is handled by the CDC schema owner)
23
Mike Schmitz High Performance Data Warehousing 23 Extraction from Change Table View insert /*+ APPEND*/ into ao_cdc_dw.F_CURRENT_DAY_PLANT_OUTPUT (generating_plant_key, output_day_key, output_minute_key, output_actual_qty_in_kwh) select p.generating_plant_key,d.output_day_key,m.output_minute_key,new.output_in_kwh from ao_cdc_dw.PO_ACTIVITY_VIEW new inner join ao_cdc_dw.d_generating_plant p on new.plant_id = p.plant_id inner join ao_cdc_dw.d_output_day d on trunc(new.output_ts) = d.output_day inner join ao_cdc_dw.d_output_minute m on to_number(substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),10,2)||substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),13,2)) = m.output_time_24hr_nbr;
24
Mike Schmitz High Performance Data Warehousing 24 Next Step Add the current days activity (the contents of the current day fact table) to the historical fact table as a new partition Index and apply constraints to the current day fact table Add a new empty partition to the fact table Exchange the current day fact table with the partition Create the new current day fact table
25
Mike Schmitz High Performance Data Warehousing 25 Let’s step thru this live
26
Mike Schmitz High Performance Data Warehousing 26 Summary We created a real-time partition for current day activity We put CDC on the operational table and created a change table populated by an asynchronous process (reads redo log) We demonstrated continuous feed to the DW by using a trigger based approach We demonstrated a batch DW feed by using the CDC subscribe process We showed how to add the current day table to the fact table and set up the next days table An electronic copy of the SQL used to build this prototype is available by emailing mike.schmitz@databaseperormance.com
27
Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation
28
Overview Benchmark Description System Description Database Parameters Performance Data
29
The Benchmark Customer OLTP benchmark run internally at Oracle Insurance application handling customer inquires and quotes over the phone N users perform M quotes Quote = actual work performed during a call with a customer Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks/commits, savepoints Compute average time for all quotes across users
30
System Info SunFire 4800 A standard Shared Memory Processor (SMP) 8 900-Mhz CPUs 16 GB physical memory Solaris 5.8 Database storage: striped across 8 Sun StorEdge T3 arrays (9X36.4MB each)
31
Database Parameters Parallel_max_servers 20 Streams_pool_size 400M (default 10% shared pool) Shared_pool_size 600M Buffer cache 128M Redo buffers 4M Processes 600
32
Change Data Capture (CDC) SyncAsync HotLog Async AutoLog AvailableOracle 9iOracle 10 g source system cost System resources Minimal Part of txnYESNO Changes seen Real timeNear real time Variable Systems112
33
Tests Conducted tests with Asynchronous Hotlog CDC enabled and disabled and with Sync CDC. Asynchronous Hotlog CDC tests conducted at different log usage levels Appr. 10, 50, and 100% of all OLTP tables with DML operations were included in CDC Tests run with: 250 concurrent users Continuous peak workload after ramp-up 175 transactions per second
34
Impact on Transaction Time
35
CPU Consumption Supplemental Logging
36
CPU Consumption 10% DML Change tracking
37
CPU Consumption 50% DML Change tracking
38
CPU Consumption 10%,100% DML Change tracking
39
Latency of Change Tracking Latency is defined as the time between the actual change and its reflection in the Change Capture Table Latency = time[change record insert] – time[redo log insert] Latency measurement were made for the 100% Asynchronous Hotlog CDC run 99.7% of records arrived in less than 2 secs 53.5% of records arrived in less than 1 sec Remaining records arrived in less than 3 sec Asynchronous CDC kept up with the constant high OLTP workload all the time
40
Summary Change Data Capture enables enterprise-ready near real-time capturing of change data No fallback for constant high-load OLTP environments Minimal impact on origin OLTP transactions Predictable additional resource requirements, solely driven by the amount of change tracking Oracle provides the flexibility to meet your “on- time” business needs
41
A Q &
42
Next Steps…. Data Warehousing DB Sessions 11:00 AM #40153, Room 304 Oracle Warehouse Builder: New Oracle Database 10 g Release 3:30 PM #40176, Room 303 Security and the Data Warehouse 4:00 PM #40166, Room 130 Oracle Database 10 g SQL Model Clause Monday 8:30 AM #40125, Room 130 Oracle Database 10 g : A Spatial VLDB Case Study 3:30 PM #40177, Room 303 Building a Terabyte Data Warehouse, Using Linux and RAC 5:00 PM #40043, Room 104 Data Pump in Oracle Database 10 g : Foundation for Ultrahigh-Speed Data Movement Tuesday For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html
43
Next Steps…. Data Warehousing DB Sessions 8:30 AM #40179, Room 304 Oracle Database 10 g Data Warehouse Backup and Recovery 11:00 AM #36782, Room 304 Experiences with Real-Time Data Warehousing using Oracle 10 g 1:00PM #40150, Room 102 Turbocharge your Database, Using the Oracle Database 10 g SQLAccess Advisor Thursday Oracle Database 10 g Oracle OLAP Oracle Data Mining Oracle Warehouse Builder Oracle Application Server 10 Business Intelligence and Data Warehousing Demos All Four Days In The Oracle Demo Campground For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html
44
Reminder – please complete the OracleWorld online session survey Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.