6 Extraction, Transformation, and Loading (ETL) Transformation
Copyright © 2005, Oracle. All rights reserved. 6-2 Objectives After completing this lesson, you should be able to do the following: Explain transformation using SQL – CREATE TABLE AS SELECT (CTAS) – UPDATE – MERGE –Multitable INSERT Describe transformation using table functions and PL/SQL Implement DML Error Logging
Copyright © 2005, Oracle. All rights reserved. 6-3 Data Transformation Data transformations are often the most complex and costly part of the ETL process. Transformations can range from simple data conversion to complex scrubbing techniques. Many, if not all, transformations can occur within the Oracle database. External transformations outside the database are supported (flat files, for instance). Data can be transformed in two ways: –Multistage data transformation –Pipelined data transformation
Copyright © 2005, Oracle. All rights reserved. 6-4 Multistage Data Transformation Load into staging table. NEW_SALES_STEP1 NEW_SALES_STEP2 Convert source product keys to warehouse product keys. SALES Flat files Validate customer keys (look up in customer dimension table). NEW_SALES_STEP3 Insert into SALES warehouse table.
Copyright © 2005, Oracle. All rights reserved. 6-5 Pipelined Data Transformation SALES Flat files Validate customer keys (look up in customer dimension table). External table Convert source product keys to warehouse product keys. Insert into SALES warehouse table.
Copyright © 2005, Oracle. All rights reserved. 6-6 Transformation Mechanisms You have the following choices for transforming data inside the database: SQL PL/SQL Table functions
Copyright © 2005, Oracle. All rights reserved. 6-7 Transformation Using SQL After data is loaded into the database, transformations can be executed using SQL operations: CREATE TABLE... AS SELECT INSERT /*+APPEND*/ AS SELECT UPDATE MERGE Multitable INSERT
Copyright © 2005, Oracle. All rights reserved. 6-8 CREATE TABLE … AS SELECT and UPDATE DESC sales_activity_direct Name Null? Type SALES_DATEDATE PRODUCT_ID NUMBER CUSTOMER_ID NUMBER PROMOTION_ID NUMBER AMOUNT NUMBER QUANTITY NUMBER INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */ INTO sales as SELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amount FROM sales_activity_direct;
Copyright © 2005, Oracle. All rights reserved. 6-9
Copyright © 2005, Oracle. All rights reserved MERGE Statement: Overview MERGE statements provide the ability to conditionally UPDATE/INSERT into the database. The DELETE clause for UPDATE branch is provided for implicit data maintenance. Conditions are specified in the ON clause. MERGE statements do an UPDATE if the row exists and an INSERT if it is a new row. Using MERGE statements avoids multiple updates. – MERGE statements use a single SQL statement to complete an UPDATE or INSERT or both.
Copyright © 2005, Oracle. All rights reserved Data Warehousing MERGE : Example MERGE INTO customer C USING cust_src S ON (c.customer_id = s.src_customer_id) WHEN MATCHED THEN UPDATE SET c.cust_address = s.cust_address WHEN NOT MATCHED THEN INSERT ( Customer_id, cust_first_name,…) VALUES (src_customer_id, src_first_name,…);
Copyright © 2005, Oracle. All rights reserved Data Maintenance with MERGE/DELETE MERGE USING product_changes S INTO products D ON (d.prod_id = s.prod_id) WHEN MATCHED THEN UPDATE SET d.prod_list_price = s.prod_new_price, d.prod_status = s.prod_newstatus DELETE WHERE (d.prod_status = "OBSOLETE") WHEN NOT MATCHED THEN INSERT (prod_id, prod_list_price, prod_status) VALUES (s.prod_id, s.prod_new_price, s.prod_new_status);
Copyright © 2005, Oracle. All rights reserved Overview of Multitable INSERT Statements Allows the INSERT … AS SELECT statement to insert rows into multiple tables as part of a single DML statement Can be used in data warehousing systems to transfer data from one or more operational sources to a set of target tables Types of multitable INSERT statements: –Unconditional INSERT –Pivoting INSERT –Conditional ALL INSERT –Conditional FIRST INSERT
Copyright © 2005, Oracle. All rights reserved Example of Unconditional INSERT INSERT ALL INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;
Copyright © 2005, Oracle. All rights reserved Example of Conditional ALL INSERT INSERT ALL WHEN product_id IN (SELECT product_id FROM promotional_items) THEN INTO promotional_sales VALUES(product_id,list_price) WHEN order_mode = 'online' THEN INTO web_orders VALUES(product_id, order_total) SELECT product_id, list_price, order_total, order_mode FROM orders;
Copyright © 2005, Oracle. All rights reserved Example of Pivoting INSERT INSERT ALL INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed)... SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_summary;
Copyright © 2005, Oracle. All rights reserved Example of Conditional FIRST INSERT INSERT FIRST WHEN order_total > THEN INTO priority_handling VALUES (id) WHEN order_total > 5000 THEN INTO special_handling VALUES (id) WHEN total > 3000 THEN INTO privilege_handling VALUES (id) ELSE INTO regular_handling VALUES (id) SELECT order_total, order_id id FROM orders;
Copyright © 2005, Oracle. All rights reserved Overview of Table Functions A table function is a function that can produce a set of rows as output. –Input can be a set of rows. Table functions support pipelined and parallel execution of transformations using: –PL/SQL –C programming language –Java Table functions are used in the FROM clause of a SELECT statement.
Copyright © 2005, Oracle. All rights reserved Creating Table Functions CREATE OR REPLACE FUNCTION transform(p ref_cur_type) RETURN table_order_items_type PIPELINED PARALLEL_ENABLE( PARTITION p BY ANY) IS BEGIN FOR rec IN p LOOP … -- Transform this record PIPE ROW (rec); END LOOP; RETURN; END;
Copyright © 2005, Oracle. All rights reserved. 6-20
Copyright © 2005, Oracle. All rights reserved Using Table Functions SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext))); INSERT /*+ APPEND, PARALLEL(order_items) */ INTO order_items SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext)));
Copyright © 2005, Oracle. All rights reserved DML Error Logging: Overview Aborting long-running bulk DML operations is wasteful of time and system resources. DML Error Logging allows bulk DML operations to continue processing after a DML error occurs. –Errors are logged to an error-logging table. DML Error Logging combines the power and speed of bulk processing with the functionality of row-level error handling.
Copyright © 2005, Oracle. All rights reserved DML Error Logging Concepts An error-logging table is created and associated with an existing table by using the DBMS_ERRLOG package. The LOG ERRORS INTO... clause is added to the bulk DML load statement providing: –Error-logging table name –Statement tag –Reject limit INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct >.2 LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10; EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');
Copyright © 2005, Oracle. All rights reserved. 6-24
Copyright © 2005, Oracle. All rights reserved Error-Logging Table The error-logging table logically consists of two parts: –Fixed: Columns that describe the error –Variable: Columns that contain data values corresponding to the row in error Column names for the variable part determine the DML table columns logged when an error occurs. The columns of the DML table to be logged is the intersection of the column names of the DML table and the error-logging table.
Copyright © 2005, Oracle. All rights reserved. 6-26
Copyright © 2005, Oracle. All rights reserved Error-Logging Table Format Fixed Portion: Error information columns User-supplied tag value via DML statement varchar2(4000) ORA_ERR_TAG$ Type of operation: (I)nsert, (U)pdate, (D)elete varchar2(2) ORA_ERR_OPTYP$ Row ID of row in error ( UPDATE / DELETE ) row ID ORA_ERR_ROWID$ Oracle error message text varchar2(4000) ORA_ERR_MESG$ Oracle error number varchar2(10) ORA_ERR_NUMBER$ Information Data type Column name
Copyright © 2005, Oracle. All rights reserved Inserting into a Table with Error Logging Example: CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER CHECK(sal > 8000)); EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog'); INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct >.2 LOG ERRORS INTO errlog (’my_bad’) REJECT LIMIT 10; 1 2 3
Copyright © 2005, Oracle. All rights reserved Summary In this lesson, you should have learned how to: Explain transformation using SQL – CREATE TABLE AS SELECT (CTAS) – UPDATE – MERGE –Multitable INSERT Describe transformation using table functions Implement DML Error Logging
Copyright © 2005, Oracle. All rights reserved Practice 6: Overview This practice covers the following topics: Loading data using multitable inserts Loading data using multitable conditional insert Loading data using MERGE Logging DML load errors