Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003.

Slides:



Advertisements
Similar presentations
PL/SQL : Stop making the same performance mistakes
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
BD05/06 PL/SQL  Introduction  Structure of a block  Variables and types  Accessing the database  Control flow  Cursors  Exceptions  Procedures.
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
Copyright © 200\8 Quest Software High Performance PL/SQL Guy Harrison Chief Architect, Database Solutions.
Loading & organising data. Objectives Loading data using direct-load insert Loading data into oracle tables using SQL*Loader conventional and direct paths.
5 Copyright © 2005, Oracle. All rights reserved. Extraction, Transformation, and Loading (ETL) Loading.
ETL - Oracle Database Features and PL/SQL Techniques Boyan Boev CNsys BGOUG
CHAPTER 14 External Tables. External Table Features An external table allows you to create a database table object that uses as its source an operating.
Advanced Package Concepts. 2 home back first prev next last What Will I Learn? Write packages that use the overloading feature Write packages that use.
A Guide to Oracle9i1 Advanced SQL And PL/SQL Topics Chapter 9.
Introduction to PL/SQL Lecture 0 – Self Study Akhtar Ali.
Copying, Managing, and Transforming Data With DTS.
PL/SQL Bulk Collections in Oracle 9i and 10g Kent Crotty Burleson Consulting October 13, 2006.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Bordoloi and Bock CURSORS. Bordoloi and Bock CURSOR MANIPULATION To process an SQL statement, ORACLE needs to create an area of memory known as the context.
PL / SQL P rocedural L anguage / S tructured Q uery L anguage Chapter 7 in Lab Reference.
SAGE Computing Services Customised Oracle Training Workshops and Consulting Are you making the most of PL/SQL? Hints and tricks and things you may have.
My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer.
Oracle PL/SQL Programming Steven Feuerstein All About the (Amazing) Function Result Cache of Oracle Database 11g.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
ETL There’s a New Sheriff in Town: Oracle OR… Not Just another Pretty Face Presented by: Bonnie O’Neil.
Lecture 4 PL/SQL language. PL/SQL – procedural SQL Allows combining procedural and SQL code PL/SQL code is compiled, including SQL commands PL/SQL code.
Oracle PL/SQL Practices. Critical elements of PL/SQL Best Practices Build your development toolbox Unit test PL/SQL programs Optimize SQL in PL/SQL programs.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
Overview · What is PL/SQL · Advantages of PL/SQL · Basic Structure of a PL/SQL Block · Procedure · Function · Anonymous Block · Types of Block · Declaring.
Dr Gordon Russell, Napier University Unit Embedded SQL - V3.0 1 Embedded SQL Unit 5.1.
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
Advanced SQL: Cursors & Stored Procedures
Learners Support Publications Classes and Objects.
6 Extraction, Transformation, and Loading (ETL) Transformation.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 8 Advanced SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Guide to Oracle 10g ITBIS373 Database Development Lecture 4a - Chapter 4: Using SQL Queries to Insert, Update, Delete, and View Data.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
What is a Package? A package is an Oracle object, which holds other objects within it. Objects commonly held within a package are procedures, functions,
D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.
Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
PL/SQLPL/SQL Oracle11g: PL/SQL Programming Chapter 4 Cursors and Exception Handling.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Implementing The Middle Tier These slides.
6 Copyright © 2009, Oracle. All rights reserved. Using the Data Transformation Operators.
Using SQL in PL/SQL Oracle Database PL/SQL 10g Programming Chapter 4.
CHAPTER 14 External Tables. External Table Features An external table allows you to create a database table object that uses as its source an operating.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata RDBMS Concepts.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
4 Copyright © 2004, Oracle. All rights reserved. Advanced Interface Methods.
CS422 Principles of Database Systems Stored Procedures and Triggers Chengyu Sun California State University, Los Angeles.
1. Advanced SQL Functions Procedural Constructs Triggers.
Oracle9i Developer: PL/SQL Programming Chapter 6 PL/SQL Packages.
Preface IIntroduction Course Objectives I-2 Oracle Complete Solution I-3 Course Agenda I-4 Tables Used in This Course I-5 The Order Entry Schema I-6 The.
1 Copyright © 2004, Oracle. All rights reserved. PL/SQL Programming Concepts: Review.
6 Copyright © 2009, Oracle. All rights reserved. Using Dynamic SQL.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Understanding Core Database Concepts Lesson 1. Objectives.
Tim Hall Oracle ACE Director
LOCO Extract – Transform - Load
Oracle11g: PL/SQL Programming Chapter 4 Cursors and Exception Handling.
PL/SQL Package Week 8.
PL/SQL Scripting in Oracle:
Chapter 2 Handling Data in PL/SQL Blocks Oracle9i Developer:
Chapter 8 Advanced SQL.
Prof. Arfaoui. COM390 Chapter 7
Presentation transcript:

Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003

Agenda Overview of Oracle 9i ETL Provides Fast Transformations Using Only the 9i DB Advanced PL/SQL Features Necessary for Understanding Oracle 9i ETL PL/SQL Performance Techniques for Data Warehouse Environments

What Is ETL? Extract Pull the Data From the Source Transform Convert the Input Format to the Target Format Encode any Values Load Insert the Transformed Data to the Target Tables

Oracle 9i ETL Extract Oracle 9i External Tables Transform PL/SQL Pipelined Table Functions Oracle Warehouse Builder Can also be Used to Build Pipelined Table Functions Maps Source Data Layout and Target Schema and Builds PL/SQL and SQL Code Load Direct Path Inserts

Performance Factors SQL Execution Time Efficiency of Execution Plan Hardware Resource Waits Code Logic Execution Time Speed of Host Language Variable Binding Time to Bind Values Back to Host Language

PL/SQL Binding Types of Binds IN-Binds Bind Values From Host Language to SQL Engine OUT-Binds Values are Returned from SQL Objects to Host Variables Bind Options Single Row Binds Bulk Binds

Single Row Binds Cursor FOR-LOOP DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM f_sales_detail WHERE customer_id = p_customer_id; v_customer_idNUMBER := 1234; BEGIN FOR rec IN cust_cur (v_customer_id) LOOP INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, rec.sales_id, sysdate); END LOOP; END;

Context Switching PL/SQL Engine DB SQL Engine OUT-BIND IN-BIND

Single Row Binds The Most Expensive Operation by Far is the Binding Single Row Binding is SLOW for Large Result Sets

Bulk Binding PL/SQL Bulk Bind Support added in 8i IN-Binds An Array of Values is Passed to the SQL Engine OUT-Binds SQL Engine Populates a PL/SQL Bind-Array Context Switch Once per Batch Instead of Once per Row Performance Increase of Up to 15 Times

Bulk Operators BULK COLLECT Specifies that Bulk Fetches Should be Used Be Careful to Handle Last Batch LIMIT Defines the Batch Size for Bulk Collections FORALL Bulk DML Operator Not a Looping Construct like a Cursor-For-Loop PL/SQL Table is Referenced in the Statement

DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE INDEX BY BINARY_INTEGER; sales_idssales_t; v_customer_idNUMBER := 1234; max_rowsCONSTANT NUMBER := 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows; FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END;

Native Compilation Allows PL/SQL to be Executed as a Compiled C Program Requires Native C Compiler on Host Enabling Set init.ora PLSQL_* Parameters Compile as Native Code PL/SQL is First Compiled Down to P-Code C Source Code is Generated from the P-Code Native Compiler is Invoked Creating a ‘C’ Shared Object Library Subsequent Calls to PL/SQL Object are Run by the ‘C’ Library

Native Compilation Performance Language Execution Speed is About Five Times Faster when not Interacting with the Database In Typical Code Interacting with Larger Data Volumes Execution Speed is Very Similar to Interpreted Code Remember that Variable Binding can be a Bigger Factor than Code Execution Speed Mixing Native and Interpreted PL/SQL Oracle Recommends an All or None Approach for Production Including User-Defined and Supplied Packages

Collection Types Associative Arrays (PL/SQL Tables) PL/SQL Type Only Nested Tables Shared Type Varrays Shared Type

Associative Arrays PL/SQL Type Only Not a SQL Type Easy to Use Automatic Element Allocation No Need to Initialize Two Kinds in 9i Release 2 INDEX BY BINARY_INTEGER INDEX BY VARCHAR2 Similar to: Java Hashtables Perl and Awk Associative Arrays

Associative Arrays DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); _map hash_table_t; CURSOR users IS SELECT username, user_id FROM dba_users; BEGIN FOR user IN users LOOP _map(user.username) := user.user_id; END LOOP; END;

Multi-Dimensional Arrays New in 9i Release 1 Implemented as Collection of Collections DECLARE TYPE element IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; TYPE twoDimensional IS TABLE OF element INDEX BY BINARY_INTEGER; twoD twoDimensional; BEGIN twoD(1)(1) := 123; twoD(1)(2) := 456; END;

Nested Tables No Maximum Size Harder to Use than Associative Arrays Need to be Initialized Code Must Explicitly Allocate New Elements Shared Type with SQL Two Options for Type Definition Local PL/SQL Definition Global SQL Type Declared in the Database Allows Variables to be Shared Between Both Environments

Nested Tables PL/SQL Scoped Type DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t := nest_tab_t(); BEGIN FOR i IN LOOP nt.EXTEND; nt(i) := i; END LOOP; END;

Nested Tables Globally Defined in SQL CREATE OR REPLACE TYPE _demo_obj_t AS OBJECT ( _idNUMBER, demo_codeNUMBER, valueVARCHAR2(30) ); / CREATE OR REPLACE TYPE _demo_nt_t AS TABLE OF _demo_obj_t; /

Nested Tables SQL-Defined Nested Tables PL/SQL Variables can be Manipulated by the SQL Engine Local PL/SQL Variables Can Be: Sorted Aggregated Used for Dynamic In-Lists Joined With SQL Tables Joined with Other PL/SQL Nested Tables

Table Functions Nested Tables Enable Table Functions SELECT * FROM TABLE( CAST(eml_dmo_nt AS _demo_nt_t) ) TABLE Operator Tells Oracle to Treat the Variable as a SQL Table CAST Operator Explicitly Tells Oracle the Data Type to be Used to Handle the Operation

Table Function Example DECLARE eml_dmo_nt _demo_nt_t := _demo_nt_t(); BEGIN -- Some logic that populates the nested table … eml_dmo_nt.EXTEND(3); eml_dmo_nt(1) := _demo_obj_t(45, 3, '23'); eml_dmo_nt(2) := _demo_obj_t(22, 3, '41'); eml_dmo_nt(3) := _demo_obj_t(18, 7, 'over_100k'); -- Process the data in assending order of id. FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS _demo_nt_t)) ORDER BY 1) LOOP dbms_output.put_line(r. _id || ' ' || r.demo_id); END LOOP; END;

Returning Result Sets Returning Collections Directly Return the Data Structure Itself Returning Reference Cursors Returns and Open Cursor to an Application Doesn’t Return Data from PL/SQL Directly Calling a Table Function from the SQL Context Convert Function Return Value into a Cursor

Returning Collections Return a Collection Type Explicitly Best Suited for PL/SQL Calling Programs FUNCTION get_ _demo(p_ _id NUMBER) RETURN _demo_nt_t IS eml_dmo _demo_nt_t; BEGIN SELECT _demo_obj_t( _id, demo_id, value) BULK COLLECT INTO eml_dmo FROM _demographic WHERE _id = p_ _id; -- Apply some business logic on the nested table here. RETURN eml_dmo; END;

Table Functions Can be Used in a SQL Context Too A Table Function Takes a Collection Type as an Argument A Function that Returns a Collection Works Too Allows us to Pass Out PL/SQL Collections as a Cursor to any Host Language SELECT * FROM TABLE( CAST( get_ _demo(45) AS _demo_nt_t));

Table Functions Data is Buffered in the Local Variable During Function Execution Cursor Returns Rows after Function Completes Private Memory Issues if the Result Set is Large Need a Way to Stream Results 9i Pipelined Table Functions Provides a Streaming Interface Rows are Returned as they are Produced Rows are Actually Buffered in Small Batches Remember Bulk Binding Issue? Can be Run in Parallel PIPELINED Keyword PIPE ROW Operator

Pipelined Table Function FUNCTION get_ _demo RETURN _demo_nt_t PIPELINED IS CURSOR _demo_cur IS SELECT _demo_obj_t( _id, demo_id, value) FROM _demographic; eml_dmo_nt _demo_nt_t; BEGIN OPEN _demo_cur; LOOP EXIT WHEN _demo_cur%NOTFOUND; FETCH _demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; END LOOP; RETURN; END;

External Tables One Last Piece of Background Information Oracle 9i External Tables Provides a Way for Oracle to Read Directly from Flat Files on the Database Server File can be Queried as if it is a Real Database Table Can Sort, Aggregate, Filter Rows, etc. External File Can be Queried in Parallel Only Table Definition is Stored in the Database Data is ‘External’ Table Definition is Similar to SQL*Loader Control File

External Tables CREATE TABLE ext_tab ( VARCHAR2(50), age NUMBER, income VARCHAR2(20)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE LOGFILE data_dir: 'ext_tab.log' BADFILE data_dir: 'ext_tab.bad' FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL ( CHAR(50), age INTEGER EXTERNAL(2), income CHAR(20) ) ) LOCATION ('ext_tab.dat') ) REJECT LIMIT UNLIMITED;

ETL Example AGEINCOME _IDDEMO_CODEVALUE over_100k Normalize, Encode and Pivot Input Record

PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); _map hash_table_t; FUNCTION transform (new_data SYS_REFCURSOR) RETURN _demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS TYPE ext_tab_array IS TABLE OF ext_tab%ROWTYPE INDEX BY BINARY_INTEGER; indata ext_tab_array; _demo_obj _demo_obj_t := _demo_obj_t(null,null,null); demo_map hash_table_t; BEGIN LOOP EXIT WHEN new_data%NOTFOUND; FETCH new_data BULK COLLECT INTO indata LIMIT 1000; FOR i IN 1..indata.COUNT LOOP _demo_obj. _id := _map(indata(i). ); _demo_obj.demo_code := 3; _demo_obj.value := indata(i).age; PIPE ROW ( _demo_obj); _demo_obj.demo_code := 7; _demo_obj.value := indata(i).income; PIPE ROW ( _demo_obj); END LOOP; RETURN; END; BEGIN FOR IN (SELECT _id, FROM ) LOOP _map( . ) := . _id; END LOOP; END;

Oracle 9i ETL Transformation is Just a Simple INSERT as SELECT Elegant Solution to Parallel, Transactional Co-processing INSERT /*+ append nologging */ INTO _demographic (SELECT /*+ parallel( a, 4 ) */ * FROM TABLE( CAST( etl.transform( CURSOR(SELECT * FROM ext_tab )) AS _demo_nt_t)) a);

Parallel Co-processing DB PL/SQL PQ Slave Input File INSERT Extract Transform Load

Performance Issues Speed is Respectable but There is a Performance Bottleneck with the Table Function Mechanism Possibly an Issue Binding Data Back from the SQL Engine Throughput is about Three Times Slower than Coding with BULK COLLECT and FORALL Operators However These Don’t Support Parallel Operations Oracle Expects to have it Fixed in Next Release

ETL Alternatives The Multi-Table INSERT Statement New in 9i Each Sub-Query Input Row Can be INSERT’ed to a Different Table … or the Same Table Multiple Times Faster than Using PL/SQL It’s Always Faster to do Something in Pure SQL than Using Any Host Language Binding is Avoided

Multi-Table Insert INSERT /*+ append nologging */ ALL INTO _demographic ( _id, demo_id, value) VALUES ( _id, 3, age) INTO _demographic ( _id, demo_id, value) VALUES ( _id, 7, income) (SELECT /*+ ordered index( b ) */ b. _id, a.income, a.age FROM ext_tab a, b WHERE a. = b. );

SQL-Only Processing DB PQ Slave Input File INSERT Extract Transform Load

Performance Solutions Minimize SQL Execution Time Exploiting Caching to Eliminate Some SQL Look-ups and Joins Direct Path Inserts Code Logic Execution Time Replacing Interpreted PL/SQL with Native Compilation Eliminating Host Language Using Multi-Table INSERTS Variable Binding Replace Single Row Binds with Bulk Binds

Conclusion Oracle 9i ETL is a High Performance ETL Solution Especially Once the Table Function Issue is Resolved Already Included in the Cost of the RDBMS