Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003.

Similar presentations


Presentation on theme: "Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003."— Presentation transcript:

1 www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

2 www.SageLogix.com Agenda Overview of Oracle 9i ETL Provides Fast Transformations Using Only the 9i DB Advanced PL/SQL Features Necessary for Understanding Oracle 9i ETL PL/SQL Performance Techniques for Data Warehouse Environments

3 www.SageLogix.com What Is ETL? Extract Pull the Data From the Source Transform Convert the Input Format to the Target Format Encode any Values Load Insert the Transformed Data to the Target Tables

4 www.SageLogix.com Oracle 9i ETL Extract Oracle 9i External Tables Transform PL/SQL Pipelined Table Functions Oracle Warehouse Builder Can also be Used to Build Pipelined Table Functions Maps Source Data Layout and Target Schema and Builds PL/SQL and SQL Code Load Direct Path Inserts

5 www.SageLogix.com Performance Factors SQL Execution Time Efficiency of Execution Plan Hardware Resource Waits Code Logic Execution Time Speed of Host Language Variable Binding Time to Bind Values Back to Host Language

6 www.SageLogix.com PL/SQL Binding Types of Binds IN-Binds Bind Values From Host Language to SQL Engine OUT-Binds Values are Returned from SQL Objects to Host Variables Bind Options Single Row Binds Bulk Binds

7 www.SageLogix.com Single Row Binds Cursor FOR-LOOP DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM f_sales_detail WHERE customer_id = p_customer_id; v_customer_idNUMBER := 1234; BEGIN FOR rec IN cust_cur (v_customer_id) LOOP INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, rec.sales_id, sysdate); END LOOP; END;

8 www.SageLogix.com Context Switching PL/SQL Engine DB SQL Engine OUT-BIND IN-BIND

9 www.SageLogix.com Single Row Binds The Most Expensive Operation by Far is the Binding Single Row Binding is SLOW for Large Result Sets

10 www.SageLogix.com Bulk Binding PL/SQL Bulk Bind Support added in 8i IN-Binds An Array of Values is Passed to the SQL Engine OUT-Binds SQL Engine Populates a PL/SQL Bind-Array Context Switch Once per Batch Instead of Once per Row Performance Increase of Up to 15 Times

11 www.SageLogix.com Bulk Operators BULK COLLECT Specifies that Bulk Fetches Should be Used Be Careful to Handle Last Batch LIMIT Defines the Batch Size for Bulk Collections FORALL Bulk DML Operator Not a Looping Construct like a Cursor-For-Loop PL/SQL Table is Referenced in the Statement

12 DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE INDEX BY BINARY_INTEGER; sales_idssales_t; v_customer_idNUMBER := 1234; max_rowsCONSTANT NUMBER := 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows; FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END;

13 www.SageLogix.com Native Compilation Allows PL/SQL to be Executed as a Compiled C Program Requires Native C Compiler on Host Enabling Set init.ora PLSQL_* Parameters Compile as Native Code PL/SQL is First Compiled Down to P-Code C Source Code is Generated from the P-Code Native Compiler is Invoked Creating a ‘C’ Shared Object Library Subsequent Calls to PL/SQL Object are Run by the ‘C’ Library

14 www.SageLogix.com Native Compilation Performance Language Execution Speed is About Five Times Faster when not Interacting with the Database In Typical Code Interacting with Larger Data Volumes Execution Speed is Very Similar to Interpreted Code Remember that Variable Binding can be a Bigger Factor than Code Execution Speed Mixing Native and Interpreted PL/SQL Oracle Recommends an All or None Approach for Production Including User-Defined and Supplied Packages

15 www.SageLogix.com Collection Types Associative Arrays (PL/SQL Tables) PL/SQL Type Only Nested Tables Shared Type Varrays Shared Type

16 www.SageLogix.com Associative Arrays PL/SQL Type Only Not a SQL Type Easy to Use Automatic Element Allocation No Need to Initialize Two Kinds in 9i Release 2 INDEX BY BINARY_INTEGER INDEX BY VARCHAR2 Similar to: Java Hashtables Perl and Awk Associative Arrays

17 www.SageLogix.com Associative Arrays DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t; CURSOR users IS SELECT username, user_id FROM dba_users; BEGIN FOR user IN users LOOP email_map(user.username) := user.user_id; END LOOP; END;

18 www.SageLogix.com Multi-Dimensional Arrays New in 9i Release 1 Implemented as Collection of Collections DECLARE TYPE element IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; TYPE twoDimensional IS TABLE OF element INDEX BY BINARY_INTEGER; twoD twoDimensional; BEGIN twoD(1)(1) := 123; twoD(1)(2) := 456; END;

19 www.SageLogix.com Nested Tables No Maximum Size Harder to Use than Associative Arrays Need to be Initialized Code Must Explicitly Allocate New Elements Shared Type with SQL Two Options for Type Definition Local PL/SQL Definition Global SQL Type Declared in the Database Allows Variables to be Shared Between Both Environments

20 www.SageLogix.com Nested Tables PL/SQL Scoped Type DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t := nest_tab_t(); BEGIN FOR i IN 1..100 LOOP nt.EXTEND; nt(i) := i; END LOOP; END;

21 www.SageLogix.com Nested Tables Globally Defined in SQL CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT ( email_idNUMBER, demo_codeNUMBER, valueVARCHAR2(30) ); / CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t; /

22 www.SageLogix.com Nested Tables SQL-Defined Nested Tables PL/SQL Variables can be Manipulated by the SQL Engine Local PL/SQL Variables Can Be: Sorted Aggregated Used for Dynamic In-Lists Joined With SQL Tables Joined with Other PL/SQL Nested Tables

23 www.SageLogix.com Table Functions Nested Tables Enable Table Functions SELECT * FROM TABLE( CAST(eml_dmo_nt AS email_demo_nt_t) ) TABLE Operator Tells Oracle to Treat the Variable as a SQL Table CAST Operator Explicitly Tells Oracle the Data Type to be Used to Handle the Operation

24 www.SageLogix.com Table Function Example DECLARE eml_dmo_nt email_demo_nt_t := email_demo_nt_t(); BEGIN -- Some logic that populates the nested table … eml_dmo_nt.EXTEND(3); eml_dmo_nt(1) := email_demo_obj_t(45, 3, '23'); eml_dmo_nt(2) := email_demo_obj_t(22, 3, '41'); eml_dmo_nt(3) := email_demo_obj_t(18, 7, 'over_100k'); -- Process the data in assending order of email id. FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1) LOOP dbms_output.put_line(r.email_id || ' ' || r.demo_id); END LOOP; END;

25 www.SageLogix.com Returning Result Sets Returning Collections Directly Return the Data Structure Itself Returning Reference Cursors Returns and Open Cursor to an Application Doesn’t Return Data from PL/SQL Directly Calling a Table Function from the SQL Context Convert Function Return Value into a Cursor

26 www.SageLogix.com Returning Collections Return a Collection Type Explicitly Best Suited for PL/SQL Calling Programs FUNCTION get_email_demo(p_email_id NUMBER) RETURN email_demo_nt_t IS eml_dmo email_demo_nt_t; BEGIN SELECT email_demo_obj_t(email_id, demo_id, value) BULK COLLECT INTO eml_dmo FROM email_demographic WHERE email_id = p_email_id; -- Apply some business logic on the nested table here. RETURN eml_dmo; END;

27 www.SageLogix.com Table Functions Can be Used in a SQL Context Too A Table Function Takes a Collection Type as an Argument A Function that Returns a Collection Works Too Allows us to Pass Out PL/SQL Collections as a Cursor to any Host Language SELECT * FROM TABLE( CAST( get_email_demo(45) AS email_demo_nt_t));

28 www.SageLogix.com Table Functions Data is Buffered in the Local Variable During Function Execution Cursor Returns Rows after Function Completes Private Memory Issues if the Result Set is Large Need a Way to Stream Results 9i Pipelined Table Functions Provides a Streaming Interface Rows are Returned as they are Produced Rows are Actually Buffered in Small Batches Remember Bulk Binding Issue? Can be Run in Parallel PIPELINED Keyword PIPE ROW Operator

29 www.SageLogix.com Pipelined Table Function FUNCTION get_email_demo RETURN email_demo_nt_t PIPELINED IS CURSOR email_demo_cur IS SELECT email_demo_obj_t(email_id, demo_id, value) FROM email_demographic; eml_dmo_nt email_demo_nt_t; BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; END LOOP; RETURN; END;

30 www.SageLogix.com External Tables One Last Piece of Background Information Oracle 9i External Tables Provides a Way for Oracle to Read Directly from Flat Files on the Database Server File can be Queried as if it is a Real Database Table Can Sort, Aggregate, Filter Rows, etc. External File Can be Queried in Parallel Only Table Definition is Stored in the Database Data is ‘External’ Table Definition is Similar to SQL*Loader Control File

31 www.SageLogix.com External Tables CREATE TABLE ext_tab (email VARCHAR2(50), age NUMBER, income VARCHAR2(20)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE LOGFILE data_dir: 'ext_tab.log' BADFILE data_dir: 'ext_tab.bad' FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL (email CHAR(50), age INTEGER EXTERNAL(2), income CHAR(20) ) ) LOCATION ('ext_tab.dat') ) REJECT LIMIT UNLIMITED;

32 www.SageLogix.com ETL Example EMAILAGEINCOME John.Doe@excite.com58over_100k EMAIL_IDDEMO_CODEVALUE 2345358 23457over_100k Normalize, Encode and Pivot Input Record

33 PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t; FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS TYPE ext_tab_array IS TABLE OF ext_tab%ROWTYPE INDEX BY BINARY_INTEGER; indata ext_tab_array; email_demo_obj email_demo_obj_t := email_demo_obj_t(null,null,null); demo_map hash_table_t; BEGIN LOOP EXIT WHEN new_data%NOTFOUND; FETCH new_data BULK COLLECT INTO indata LIMIT 1000; FOR i IN 1..indata.COUNT LOOP email_demo_obj.email_id := email_map(indata(i).email); email_demo_obj.demo_code := 3; email_demo_obj.value := indata(i).age; PIPE ROW (email_demo_obj); email_demo_obj.demo_code := 7; email_demo_obj.value := indata(i).income; PIPE ROW (email_demo_obj); END LOOP; RETURN; END; BEGIN FOR email IN (SELECT email_id, email FROM email) LOOP email_map(email.email) := email.email_id; END LOOP; END;

34 www.SageLogix.com Oracle 9i ETL Transformation is Just a Simple INSERT as SELECT Elegant Solution to Parallel, Transactional Co-processing INSERT /*+ append nologging */ INTO email_demographic (SELECT /*+ parallel( a, 4 ) */ * FROM TABLE( CAST( etl.transform( CURSOR(SELECT * FROM ext_tab )) AS email_demo_nt_t)) a);

35 www.SageLogix.com Parallel Co-processing DB PL/SQL PQ Slave Input File INSERT Extract Transform Load

36 www.SageLogix.com Performance Issues Speed is Respectable but There is a Performance Bottleneck with the Table Function Mechanism Possibly an Issue Binding Data Back from the SQL Engine Throughput is about Three Times Slower than Coding with BULK COLLECT and FORALL Operators However These Don’t Support Parallel Operations Oracle Expects to have it Fixed in Next Release

37 www.SageLogix.com ETL Alternatives The Multi-Table INSERT Statement New in 9i Each Sub-Query Input Row Can be INSERT’ed to a Different Table … or the Same Table Multiple Times Faster than Using PL/SQL It’s Always Faster to do Something in Pure SQL than Using Any Host Language Binding is Avoided

38 www.SageLogix.com Multi-Table Insert INSERT /*+ append nologging */ ALL INTO email_demographic (email_id, demo_id, value) VALUES (email_id, 3, age) INTO email_demographic (email_id, demo_id, value) VALUES (email_id, 7, income) (SELECT /*+ ordered index( b ) */ b.email_id, a.income, a.age FROM ext_tab a, email b WHERE a.email = b.email);

39 www.SageLogix.com SQL-Only Processing DB PQ Slave Input File INSERT Extract Transform Load

40 www.SageLogix.com Performance Solutions Minimize SQL Execution Time Exploiting Caching to Eliminate Some SQL Look-ups and Joins Direct Path Inserts Code Logic Execution Time Replacing Interpreted PL/SQL with Native Compilation Eliminating Host Language Using Multi-Table INSERTS Variable Binding Replace Single Row Binds with Bulk Binds

41 www.SageLogix.com Conclusion Oracle 9i ETL is a High Performance ETL Solution Especially Once the Table Function Issue is Resolved Already Included in the Cost of the RDBMS


Download ppt "Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003."

Similar presentations


Ads by Google