Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing

Slides:



Advertisements
Similar presentations
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Advertisements

Supervisor : Prof . Abbdolahzadeh
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Database Architectures and the Web
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2007 all rights.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Components and Architecture CS 543 – Data Warehousing.
Beyond data modeling Model must be normalised – purpose ? Outcome is a set of tables = logical design Then, design can be warped until it meets the realistic.
Harvard University Oracle Database Administration Session 2 System Level.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
IBM Mainframe-Integration Mainframe Change Data Capture
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Designing a Data Warehouse
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Passage Three Introduction to Microsoft SQL Server 2000.
Director Product Management
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
ETL By Dr. Gabriel.
An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.
Oracle Recovery Manager (RMAN) 10g : Reloaded Tammy Bednar Sr. Product Manager Oracle Corporation Session id:
Oracle10g for Data Warehousing Jiangang Luo
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Performance Tuning SSIS. HR Departments are no fun. Don’t mention the stalking incident with Clay Aiken What happened in Vegas My prom date with a puppet.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
1 Oracle Corporation Oracle Change Data Capture Jack Raitto, Development Manager Oracle NEDC NYOUG Long Island SIG October 7, 2004.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Ch 5. The Evolution of Analytic Processes
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
5 Copyright © 2009, Oracle. All rights reserved. Right-Time Data Warehousing with OWB.
ORACLE
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
1.
1 Data Warehouses BUAD/American University Data Warehouses.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
Building a Terabyte Data Warehouse, Using Linux and RAC George Lumpkin Director Product Management Oracle Corporation Session id:
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Page 1. Data Integration Using Oracle Streams A Case Study Session #:
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Transportation: Refreshing Warehouse Data Chapter 13.
© 2009 IBM Corporation Maximize Cost Savings While Improving Visibility Into Lines of Business Wendy Tam, CDC Product Marketing Manager
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
7 Strategies for Extracting, Transforming, and Loading.
The Self-managing Database: Proactive Space and Schema Object Management Amit Ganesh Director, Data, Space and Transaction Processing Oracle Corporation.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
Copyright © 2006, Oracle. All rights reserved. Czinkóczki László oktató Using the Oracle Warehouse Builder.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Supervisor : Prof . Abbdolahzadeh
Data, Space and Transaction Processing
Database Architectures and the Web
Antonio Abalos Castillo
Chapter 13 The Data Warehouse
Database Architectures and the Web
Data Warehouse.
Presented by: Warren Sifre
Introduction of Week 3 Assignment Discussion
An Introduction to Data Warehousing
Cloud Data Replication with SQL Data Sync
Data Warehousing Concepts
McGraw-Hill Technology Education
Big DATA.
Presentation transcript:

Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation

Mike Schmitz High Performance Data Warehousing 2 Agenda The meaning of Real-Time in Data Warehousing Customer Business Scenario Customer Environment “Real-Time” Requirement Our Real-Time Solution Real-Time data architecture Incremental Operational Source Change Capture Transformation and Population into DW Target Simplified Functional Demonstration Asynchronous Change Data Capture (Oracle) Performance Characteristics and Considerations

Mike Schmitz High Performance Data Warehousing 3 My Background An independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or small- scale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U.S. and Europe. I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF.

Mike Schmitz High Performance Data Warehousing 4 Real-Time in Data Warehousing Data Warehousing Systems are complex environments Business rules Various data process flows and dependencies Almost never pure Real-Time Some latency is a given What do you need? Real Time Near Real-Time Just in Time for the business

Mike Schmitz High Performance Data Warehousing 5 Customer Business Scenario Client provides software solutions for utility companies Utility companies have plants generating energy supply Recommended maximum output capacity Reserve Capacity Buy supplemental energy as needed Peak demand periods are somewhat predictable Each day is pre-planned on historical behavior Cheaper to buy energy ahead Expensive to have unused capacity Existing data warehouse supports the planning function Reduced option expenses Cut down of supplemental energy costs

Mike Schmitz High Performance Data Warehousing 6 Customer “Real-Time” Requirement Getting more in-time accuracy enhances operational business Compare today's plant output volumes to yesterdays or last week’s average Know when to purchase additional options or supplies Customer Target Actual data within a 5 minute lag Use a single query Use a single tool

Mike Schmitz High Performance Data Warehousing 7 Sample Analysis Graph

Mike Schmitz High Performance Data Warehousing 8 Our Real-Time Solution Overview Three-Step Approach: 1. Implement a real-time DW data architecture 2. Near real-time incremental change capture from operational system 3. Transformation and Propagation (population) of change data to DW

Mike Schmitz High Performance Data Warehousing 9 Our Real-Time Solution Real-Time DW Data Architecture Add a Real-Time “Partition” to our Plant Output Fact Table for current day activity Separate physical table No indexes or RFI constraints (data coming in will have RFI enforced) during daily activity UNION ALL viewed to the Plant Output Fact Table

Mike Schmitz High Performance Data Warehousing 10 Our Real-Time Solution Change Capture and Population 1. Incremental change capture from operational site Synchronous or Asynchronous 2. Transformation and Propagation (population) of change data to the DW Continuous trickle feed or periodic batch Operations StagingDW Asynch CDC Trigger Batch Synch CDC

Mike Schmitz High Performance Data Warehousing 11 Our Real-Time Solution Incremental Change Capture Done with Oracle’s Change Data Capture (CDC) functionality Synchronous CDC available with Oracle9 i Asynchronous CDC with Oracle10 g Asynchronous CDC is the preferred mechanism Decoupling of change capture from the operational transaction

Mike Schmitz High Performance Data Warehousing Asynchronous CDC SQL interface to change data Publish/subscribe paradigm Parallel access to log files, leveraging Oracle Streams Parallel transformation of data OLTP DB Redo log files Logical Change Data Based on Log Miner Oracle10g DW Tables SQL, PL/SQL, Java Transform

Mike Schmitz High Performance Data Warehousing 13 Our Real-Time Solution Population of Change Data into DW Continuous Change table owner creates trigger to populate warehouse real-time partition Periodic Batch Utilize the Subscribe Interface Subscribe to specific table and column changes through view Sets a window and extracts the changes at required period Purges view and moves window

Mike Schmitz High Performance Data Warehousing 14 Integrate daily changes into historical fact table At the end of the day index the current day table and apply constraints (no validate) Create new fact table partition Exchange current day table with new partition Create next days “Real-Time Partition” table Our Real-Time Solution The Daily Process

Mike Schmitz High Performance Data Warehousing 15 Simplified Functional Demo Schema Owners AO_CDC_OP Owns the operational schema AO_CDC Owns the CDC change sets and change tables (needs special cdc privileges) ? CDC Publish Role AO_CDC_DW Owns the data warehouse schema (also needs special cdc privileges) ? CDC Subscribe Role

Simplified Functional Demo Operational Schema

Simplified Functional Demo Data Warehouse Schema

Mike Schmitz High Performance Data Warehousing 18 What do we have? Operational transaction table AO_CDC_OP.PLANT_OUTPUT DW historical partitioned fact table AO_CDC_DW.F_PLANT_OUTPUT DW current day table (“Real-Time Partition”) AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT Data Warehouse UNION ALL view AO_CDC_DW.V_PLANT_OUTPUT

Mike Schmitz High Performance Data Warehousing 19 First The CDC user publishes Create a Change Set (CDC_DW) Add supplemental logging for the operational table Create a change table for the operational table (CT_PLANT_OUTPUT) Force database logging on the tablespace to catch any bulk insert /*+ APPEND */ (non- logged) activity

Mike Schmitz High Performance Data Warehousing 20 Next – Transform and Populate One of two ways Continuous Feed Logged Insert activity Permits nearer real-time Constant system load Periodic Batch Feed Permits non-logged bulk operations You set the lag time – how often do you run the batch process? Hourly Every five minutes Less system load overall

Mike Schmitz High Performance Data Warehousing 21 The Continuous Feed Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations

Mike Schmitz High Performance Data Warehousing 22 The Batch Feed The CDC schema owner Authorizes AO_CDC_DW to select from the change table (the select will be accomplished via a generated view) The DW schema owner Subscribes to the change table and the columns he needs (with a centralized EDW approach this would usually be the whole change table) with a subscription and view name Activates the subscription Extract Extend the window Extracts changed data via the view (same code as trigger) Purges the window (logical Delete – physical deletion is handled by the CDC schema owner)

Mike Schmitz High Performance Data Warehousing 23 Extraction from Change Table View insert /*+ APPEND*/ into ao_cdc_dw.F_CURRENT_DAY_PLANT_OUTPUT (generating_plant_key, output_day_key, output_minute_key, output_actual_qty_in_kwh) select p.generating_plant_key,d.output_day_key,m.output_minute_key,new.output_in_kwh from ao_cdc_dw.PO_ACTIVITY_VIEW new inner join ao_cdc_dw.d_generating_plant p on new.plant_id = p.plant_id inner join ao_cdc_dw.d_output_day d on trunc(new.output_ts) = d.output_day inner join ao_cdc_dw.d_output_minute m on to_number(substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),10,2)||substr(to_char(new.output_ts,'YYYYMMDD HH:II:SS'),13,2)) = m.output_time_24hr_nbr;

Mike Schmitz High Performance Data Warehousing 24 Next Step Add the current days activity (the contents of the current day fact table) to the historical fact table as a new partition Index and apply constraints to the current day fact table Add a new empty partition to the fact table Exchange the current day fact table with the partition Create the new current day fact table

Mike Schmitz High Performance Data Warehousing 25 Let’s step thru this live

Mike Schmitz High Performance Data Warehousing 26 Summary We created a real-time partition for current day activity We put CDC on the operational table and created a change table populated by an asynchronous process (reads redo log) We demonstrated continuous feed to the DW by using a trigger based approach We demonstrated a batch DW feed by using the CDC subscribe process We showed how to add the current day table to the fact table and set up the next days table An electronic copy of the SQL used to build this prototype is available by ing

Michael Brey Principal Member Technical Staff ST/NEDC Oracle Engineering Oracle Corporation

Overview Benchmark Description System Description Database Parameters Performance Data

The Benchmark Customer OLTP benchmark run internally at Oracle Insurance application handling customer inquires and quotes over the phone N users perform M quotes Quote = actual work performed during a call with a customer Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks/commits, savepoints Compute average time for all quotes across users

System Info SunFire 4800 A standard Shared Memory Processor (SMP) Mhz CPUs 16 GB physical memory Solaris 5.8 Database storage: striped across 8 Sun StorEdge T3 arrays (9X36.4MB each)

Database Parameters Parallel_max_servers 20 Streams_pool_size 400M (default 10% shared pool) Shared_pool_size 600M Buffer cache 128M Redo buffers 4M Processes 600

Change Data Capture (CDC) SyncAsync HotLog Async AutoLog AvailableOracle 9iOracle 10 g source system cost System resources Minimal Part of txnYESNO Changes seen Real timeNear real time Variable Systems112

Tests Conducted tests with Asynchronous Hotlog CDC enabled and disabled and with Sync CDC. Asynchronous Hotlog CDC tests conducted at different log usage levels Appr. 10, 50, and 100% of all OLTP tables with DML operations were included in CDC Tests run with: 250 concurrent users Continuous peak workload after ramp-up 175 transactions per second

Impact on Transaction Time

CPU Consumption Supplemental Logging

CPU Consumption 10% DML Change tracking

CPU Consumption 50% DML Change tracking

CPU Consumption 10%,100% DML Change tracking

Latency of Change Tracking Latency is defined as the time between the actual change and its reflection in the Change Capture Table Latency = time[change record insert] – time[redo log insert] Latency measurement were made for the 100% Asynchronous Hotlog CDC run 99.7% of records arrived in less than 2 secs 53.5% of records arrived in less than 1 sec Remaining records arrived in less than 3 sec Asynchronous CDC kept up with the constant high OLTP workload all the time

Summary Change Data Capture enables enterprise-ready near real-time capturing of change data No fallback for constant high-load OLTP environments Minimal impact on origin OLTP transactions Predictable additional resource requirements, solely driven by the amount of change tracking Oracle provides the flexibility to meet your “on- time” business needs

A Q &

Next Steps…. Data Warehousing DB Sessions 11:00 AM #40153, Room 304 Oracle Warehouse Builder: New Oracle Database 10 g Release 3:30 PM #40176, Room 303 Security and the Data Warehouse 4:00 PM #40166, Room 130 Oracle Database 10 g SQL Model Clause Monday 8:30 AM #40125, Room 130 Oracle Database 10 g : A Spatial VLDB Case Study 3:30 PM #40177, Room 303 Building a Terabyte Data Warehouse, Using Linux and RAC 5:00 PM #40043, Room 104 Data Pump in Oracle Database 10 g : Foundation for Ultrahigh-Speed Data Movement Tuesday For More Info On Oracle BI/DW Go To

Next Steps…. Data Warehousing DB Sessions 8:30 AM #40179, Room 304 Oracle Database 10 g Data Warehouse Backup and Recovery 11:00 AM #36782, Room 304 Experiences with Real-Time Data Warehousing using Oracle 10 g 1:00PM #40150, Room 102 Turbocharge your Database, Using the Oracle Database 10 g SQLAccess Advisor Thursday Oracle Database 10 g Oracle OLAP Oracle Data Mining Oracle Warehouse Builder Oracle Application Server 10 Business Intelligence and Data Warehousing Demos All Four Days In The Oracle Demo Campground For More Info On Oracle BI/DW Go To

Reminder – please complete the OracleWorld online session survey Thank you.