Optimizing Data Warehouse Loads via Parallel Pro-C and Parallel/Direct-Mode SQL Bert Scalzo, Ph.D.

Slides:

Advertisements

Similar presentations

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.

Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.

Part IV: Memory Management

Slide 2-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 2 Using the Operating System 2.

Big Data Working with Terabytes in SQL Server Andrew Novick

Memory Management Chapter 7.

Technical BI Project Lifecycle

Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.

Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.

Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.

Memory Management 2010.

Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.

CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.

Memory Management Chapter 5.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.

Lecture 9: SHELL PROGRAMMING (continued) Creating shell scripts!

A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.

5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.

Designing a Data Warehouse

Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.

Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.

Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.

Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.

IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.

1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.

1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.

1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.

Database Systems Slide 1 Database Systems Lecture 5 Overview of Oracle Database Architecture - Concept Manual : Chapters 1,8 Lecturer : Dr Bela Stantic.

Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.

Optimizing Data Warehouse Ad-Hoc Queries against "Star Schemas " By Bert Scalzo, Ph.D.

CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.

Advanced SQL Application Tuning: Find the Proverbial Needle in the Haystack Bert Scalzo, Ph.D.

Building and Optimizing Data Warehouse "Star Schemas" with MySQL Bert Scalzo, Ph.D.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

Virtual Memory Virtual Memory is created to solve difficult memory management problems Data fragmentation in physical memory: Reuses blocks of memory.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.

D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.

Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.

Department of Computer Science and Software Engineering

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.

Threads. Readings r Silberschatz et al : Chapter 4.

1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.

for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.

DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.

Oracle9i Developer: PL/SQL Programming Chapter 11 Performance Tuning.

Copyright © 2006 Quest Software Quest RAC Tools Bert Scalzo, Domain Expert, Oracle Solutions

Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.

Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.

What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.

UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.

11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.

CS703 - Advanced Operating Systems By Mr. Farhan Zaidi.

Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.

Advanced QlikView Performance Tuning Techniques

Informatica PowerCenter Performance Tuning Tips

Database Performance Tuning and Query Optimization

IB Computer Science Topic 2.1.1

Introduction of Week 3 Assignment Discussion

CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE

Threads Chapter 4.

Chapter 11 Database Performance Tuning and Query Optimization

Presentation transcript:

Optimizing Data Warehouse Loads via Parallel Pro-C and Parallel/Direct-Mode SQL Bert Scalzo, Ph.D.

About the Author Oracle DBA from 4 through 10g Worked for Oracle Education Worked for Oracle Consulting Holds several Oracle Masters BS, MS and PhD in Computer Science MBA and insurance industry designations Articles in Oracle Magazine Oracle Informant PC Week (now E-Magazine)

About Quest Software

Star Schema Design Dimensions: smaller, de-normalized tables containing business descriptive columns that users use to query Facts: very large tables with primary keys formed from the concatenation of related dimension table foreign key columns, and also possessing numerically additive, non- key columns used for calculations during user queries “Star schema” approach to dimensional data modeling was pioneered by Ralph Kimball

Dimensions Facts

10 8th th 10 3rd -10 5th

The Loading Challenge How much data would a data loader load, if a data loader could load data? Dimensions: often reloaded in their entirety, since they only have have tens to hundreds of thousands of rows Facts: must be cumulatively loaded, since they generally have hundreds of millions to billions of rows – with daily data loading requirements of million rows or more

Hardware Won’t Compensate Often people have unrealistic expectation that using expensive hardware is only way to obtain optimal application performance CPU SMP MPP Disk IO 15,000 RPM RAID (EMC) OS UNIX 64-bit Oracle OPS / PQO 64-bit

Hardware Tuning Example Problem: Data load runtime 4+ hours MHz 64-bit CPU’s 4 Gigabytes UNIX RAM 2 Gigabytes EMC Cache RAID 5 (slower on writes) Attempt #1: Bought more hardware MHz 64-bit CPU’s 8 Gigabytes UNIX RAM 4 Gigabytes EMC Cache RAID 1 (faster on writes) Runtime still 4+ hours !!!

Application Tuning Example Attempt #2: Redesigned application Convert PL/SQL to Pro-C Run 16 streams in parallel Better utilize UNIX capabilities Run time = 20 minutes !!! Attempt #3: Tuned the SQL code Tune individual SQL statements Use Dynamic SQL method # 2 Prepare SQL outside loop Execute Prep SQL in loop Run time = 15 minutes !!!

Lesson Learned Hardware: Cost approximately $1,000,000 System downtime for upgrades Zero runtime improvement Loss of credibility with customer Redesign: 4 hours DBA $150/hour = $ hours Developer $100/hour = $2000 Total cost = $2600 or 385 times less Golden Rule #1: Application redesign much cheaper than hardware!!!

Program Design Paramount In reality, the loading program’s design is the key factor for the fastest possible data loads into any large-scale data warehouse Data loading programs must be designed to utilize SMP/MPP architectures, otherwise CPU usage may not exceed 1 / # of CPU’s Golden Rule #2: minimize inter-process waits and maximize total concurrent CPU usage

Example Loading Problem Hardware: HP-9000, V2200, 16 CPU, 8 GB RAM EMC 3700 RAID-1 with 4 GB cache Database: Oracle (32 bit) Tables partitioned by month Indexes partitioned by month Nightly Load: 6000 files with 20 million detail rows Summarize details across 3 aggregates

Original, Bad Design

Original’s Physical Problems IO Intensive: 5 IO’s from source to destination Wasted IO’s to copy files twice Large Waits: Each step dependent on predecessor No overlapping of any load operations Single Threaded: No aspect of processing is parallel Overall CPU usage less than 7%

Original’s Logical Problems Brute Force: Simple for programmers to visualize Does not leverage UNIX’s strengths Record Oriented: Simple for programmers to code (cursors) Does not leverage SQL’s strengths (sets) Stupid Aggregation: Process record #1, create aggregate record Process record #2, update aggregate record Repeat last step for each record being input

Original Version’s Timings Process Step Start Time Duration (minutes) CatT SortT SQL LoaderT PL/SQLT T CPU Utilization Glance Plus Display HP-UX CPU’s

Parallel Design Options Parallel/Direct SQL Loader: Use Parallel, Direct option for speed Cannot do data lookups and data scrubbing without complex pre-insert/update triggers Multi-Threaded Pro-C: Hard to monitor via UNIX commands Difficult to program and hard to debug “Divide & Conquer”: Leverages UNIX’s key strengths Easy to monitor via UNIX commands Simple UNIX shell scripting exercise Simple Pro-C programming exercise

What Are Threads? Multithreaded applications have multiple threads, where each thread: is a "lightweight" sub-processes executes within the main process shares code and data segments (i.e. address space) has its own program counters, registers and stack Global and static variables are common to all threads and require a mutual exclusivity mechanism to manage access to from multiple threads within the application.

Non-Mutex Architecture

Non-Mutex Code main() { sql_context ctx1,ctx2; /* declare runtime contexts */ EXEC SQL ENABLE THREADS; EXEC SQL CONTEXT ALLOCATE :ctx1; EXEC SQL CONTEXT ALLOCATE :ctx2;... /* spawn thread, execute function1 (in the thread) passing ctx1 */ thread_create(..., function1, ctx1); /* spawn thread, execute function2 (in the thread) passing ctx2 */ thread_create(..., function2, ctx2);... EXEC SQL CONTEXT FREE :ctx1; EXEC SQL CONTEXT FREE :ctx2;... } void function1(sql_context ctx) { EXEC SQL CONTEXT USE :ctx; /* execute executable SQL statements on runtime context ctx1!!! */... } void function2(sql_context ctx) { EXEC SQL CONTEXT USE :ctx; /* execute executable SQL statements on runtime context ctx2!!! */... }

“Divide & Conquer” Design

Step #1: Form Streams degree=16 file_name= ras.dltx.postrn file_count=`ll ${file_name}.* | wc -l` if [ $file_count ] then if [ -f file_list* ] then rm -f file_list* fi ls ${file_name}.* > file_list split_count=`expr \( $file_count + file_count % $degree \) / $degree` split -$split_count file_list file_list_ ### Step #2’s code goes here ### fi Unix shell script to form N streams (i.e. groups) of M/N data sets from M input files

Example for Step #1 files: ras.dltx.postrn.1 ras.dltx.postrn.2 ras.dltx.postrn.3 ras.dltx.postrn.4 ras.dltx.postrn.5 ras.dltx.postrn.6 ras.dltx.postrn.7 ras.dltx.postrn.8 ras.dltx.postrn.9 ras.dltx.postrn.10 ras.dltx.postrn.11 ras.dltx.postrn.12 file_list_aa: ras.dltx.postrn.1 ras.dltx.postrn.2 ras.dltx.postrn.3 ras.dltx.postrn.4 file_list_ab: ras.dltx.postrn.5 ras.dltx.postrn.6 ras.dltx.postrn.7 ras.dltx.postrn.8 … Data Set 1 Data Set 2

Step #2: Process Streams for file in `ls file_list_*` do ( ( cat $file | while read line do if [ -s $line ] then cat $line | pro_c_program fi )& done )& done wait Unix shell script to create N concurrent background processes, each handling one of the streams’ data sets

Example for Step #2 file_list_aa: ras.dltx.postrn.1 ras.dltx.postrn.2 ras.dltx.postrn.3 ras.dltx.postrn.4 for each file skip if empty grep file sort file run Pro-C inserts data file_list_ab: ras.dltx.postrn.5 ras.dltx.postrn.6 ras.dltx.postrn.7 ras.dltx.postrn.8 for each file skip if empty grep file sort file run Pro-C inserts data All running concurrently, with no wait states

Step #3: Calc Aggregations alter session enable parallel dml; insert /*+ parallel (aggregate_table, 16) append */ into aggregate_table (period_id, location_id, product_id, vendor_id, … ) select /*+ parallel (detail_table,16 full(detail_table) ) */ period_id, location_id, product_id, vendor_id, …, sum(nvl(column_x,0)) from detail_table where period_id between $BEG_ID and $END_ID group by period_id, location_id, product_id, vendor_id;

Pro-C Program Algorithm: Read records from Standard IO until EOF Perform record lookups and data scrubbing Insert processed record into detail table If record already exists, update instead Commit every 1000 inserts or updates Special Techniques: Dynamic SQL Method #2 (15% improvement) Prepare SQL outside record processing loop Execute SQL inside record processing loop

Dynamic SQL

“Divide & Conquer” Version’s Timings Process Step Start Time Duration (minutes) Stream #1T Stream #…T Stream #16T AggregateT T CPU Utilization Glance Plus Display HP-UX CPU’s

Results Old Run Time = 270 Minutes New Run Time = 25 Minutes RT Improvement = 1080 % Customer’s Reaction: took team to Ranger’s baseball game gave team a pizza party & 1/2 day off gave entire team plaques of appreciation

Other Possible Improvements Shell Script: Split files based upon size to better balance load Pro-C: Use Pro-C host arrays for inserts/updates Read lookup tables into process memory (PGA) Fact Tables: Partition tables and indexes by both time and parallel separation criteria (e.g. time zone)