6 Extraction, Transformation, and Loading (ETL) Transformation.

Slides:



Advertisements
Similar presentations
Oracle 10g & 11g for Dev Virtual Columns DML error logging
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Manipulating Data Schedule: Timing Topic 60 minutes Lecture
Loading & organising data. Objectives Loading data using direct-load insert Loading data into oracle tables using SQL*Loader conventional and direct paths.
5 Copyright © 2005, Oracle. All rights reserved. Extraction, Transformation, and Loading (ETL) Loading.
ETL - Oracle Database Features and PL/SQL Techniques Boyan Boev CNsys BGOUG
8 Copyright © Oracle Corporation, All rights reserved. Manipulating Data.
9 Copyright © 2009, Oracle. All rights reserved. Managing Data Concurrency.
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Using the Set Operators Assist. Prof. Pongpisit Wuttidittachotti, Ph.D. Faculty.
2 Copyright © 2004, Oracle. All rights reserved. Creating Stored Functions.
4 Copyright © 2004, Oracle. All rights reserved. Database Interfaces.
Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. Open World 2003.
10 Copyright © 2009, Oracle. All rights reserved. Using DDL Statements to Create and Manage Tables.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to PL/SQL.
1 Copyright © 2006, Oracle. All rights reserved. Using DDL Statements to Create and Manage Tables.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
1 Copyright © 2004, Oracle. All rights reserved. Introduction.
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
9 Copyright © 2007, Oracle. All rights reserved. Managing Data and Concurrency.
20 Copyright © Oracle Corporation, All rights reserved. Oracle9 i Extensions to DML and DDL Statements.
11 Copyright © Oracle Corporation, All rights reserved. Creating Views.
Advanced SQL: Cursors & Stored Procedures
1 Copyright © 2004, Oracle. All rights reserved. Introduction to PL/SQL.
Manipulating Data in PL/SQL. 2 home back first prev next last What Will I Learn? Construct and execute PL/SQL statements that manipulate data with DML.
PL/SQLPL/SQL Oracle10g Developer: PL/SQL Programming Chapter 3 Handling Data in PL/SQL Blocks.
8 Copyright © 2005, Oracle. All rights reserved. Managing Data.
7 Copyright © 2005, Oracle. All rights reserved. Managing Undo Data.
Manipulating Large Data Sets. Lesson Agenda ◦ Manipulating data using subqueries ◦ Specifying explicit default values in the INSERT and UPDATE statements.
4 Copyright © 2007, Oracle. All rights reserved. Manipulating Large Data Sets.
Copyright © 2004, Oracle. All rights reserved. MANIPULATING LARGE DATA SETS Oracle Lecture 10.
3 Copyright © 2006, Oracle. All rights reserved. Manipulating Large Data Sets.
Objectives Database triggers and syntax
PL/SQLPL/SQL Oracle10g Developer: PL/SQL Programming Chapter 9 Database Triggers.
D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.
6 Copyright © 2009, Oracle. All rights reserved. Using the Data Transformation Operators.
Oracle10g Developer: PL/SQL Programming1 Objectives SQL queries within PL/SQL Host or bind variables The %TYPE attribute Include queries and control structures.
Introduction to Explicit Cursors. 2 home back first prev next last What Will I Learn? Distinguish between an implicit and an explicit cursor Describe.
PL/SQLPL/SQL Oracle11g: PL/SQL Programming Chapter 9 Database Triggers.
PL/SQLPL/SQL Oracle10g Developer: PL/SQL Programming Chapter 9 Database Triggers.
9 Copyright © 2009, Oracle. All rights reserved. Deploying and Reporting on ETL Jobs.
Retrieving Data in PL/SQL. 2 home back first prev next last What Will I Learn? In this lesson, you will learn to: –Recognize the SQL statements that can.
Oracle11g: PL/SQL Programming Chapter 3 Handling Data in PL/SQL Blocks.
DML Part 1 Yogiek Indra Kurniawan. All Files Can Be Downloaded at : Menu : “Perkuliahan”
1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.
6 Copyright © 2004, Oracle. All rights reserved. Adding Custom Validation.
IMS 4212: Application Architecture and Intro to Stored Procedures 1 Dr. Lawrence West, Management Dept., University of Central Florida
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Using Data Model Editor to Create Data Models Based on a SQL Query Data Set.
Simple Queries DBS301 – Week 1. Objectives Basic SELECT statement Computed columns Aliases Concatenation operator Use of DISTINCT to eliminate duplicates.
5 Copyright © 2008, Oracle. All rights reserved. Testing and Validating a Repository.
6 Copyright © 2006, Oracle. All rights reserved. The ETL Process: Transforming Data.
8 Copyright © 2005, Oracle. All rights reserved. Managing Schema Objects.
11 Copyright © 2004, Oracle. All rights reserved. Managing XML Data in an Oracle 10g Database.
1 Copyright © 2007, Oracle. All rights reserved. Introducing the Oracle Database 11g SQL and PL/SQL New Features.
13 Copyright © 2004, Oracle. All rights reserved. Migrating SQL Statements.
6 Copyright © 2009, Oracle. All rights reserved. Using Dynamic SQL.
1 Copyright © 2005, Oracle. All rights reserved. Oracle Database Administration: Overview.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
D Copyright © 2009, Oracle. All rights reserved. Using SQL*Plus.
Tim Hall Oracle ACE Director
Interacting with the Oracle Server
Using the Set Operators
Manipulating Data.
Contents Preface I Introduction Lesson Objectives I-2
SQL2-ch3 操控大型資料集.
Handling Data in PL/SQL Blocks
Updating Databases With Open SQL
Manipulating Data Lesson 3.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved. 6-2 Objectives After completing this lesson, you should be able to do the following: Explain transformation using SQL – CREATE TABLE AS SELECT (CTAS) – UPDATE – MERGE –Multitable INSERT Describe transformation using table functions and PL/SQL Implement DML Error Logging

Copyright © 2005, Oracle. All rights reserved. 6-3 Data Transformation Data transformations are often the most complex and costly part of the ETL process. Transformations can range from simple data conversion to complex scrubbing techniques. Many, if not all, transformations can occur within the Oracle database. External transformations outside the database are supported (flat files, for instance). Data can be transformed in two ways: –Multistage data transformation –Pipelined data transformation

Copyright © 2005, Oracle. All rights reserved. 6-4 Multistage Data Transformation Load into staging table. NEW_SALES_STEP1 NEW_SALES_STEP2 Convert source product keys to warehouse product keys. SALES Flat files Validate customer keys (look up in customer dimension table). NEW_SALES_STEP3 Insert into SALES warehouse table.

Copyright © 2005, Oracle. All rights reserved. 6-5 Pipelined Data Transformation SALES Flat files Validate customer keys (look up in customer dimension table). External table Convert source product keys to warehouse product keys. Insert into SALES warehouse table.

Copyright © 2005, Oracle. All rights reserved. 6-6 Transformation Mechanisms You have the following choices for transforming data inside the database: SQL PL/SQL Table functions

Copyright © 2005, Oracle. All rights reserved. 6-7 Transformation Using SQL After data is loaded into the database, transformations can be executed using SQL operations: CREATE TABLE... AS SELECT INSERT /*+APPEND*/ AS SELECT UPDATE MERGE Multitable INSERT

Copyright © 2005, Oracle. All rights reserved. 6-8 CREATE TABLE … AS SELECT and UPDATE DESC sales_activity_direct Name Null? Type SALES_DATEDATE PRODUCT_ID NUMBER CUSTOMER_ID NUMBER PROMOTION_ID NUMBER AMOUNT NUMBER QUANTITY NUMBER INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */ INTO sales as SELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amount FROM sales_activity_direct;

Copyright © 2005, Oracle. All rights reserved. 6-9

Copyright © 2005, Oracle. All rights reserved MERGE Statement: Overview MERGE statements provide the ability to conditionally UPDATE/INSERT into the database. The DELETE clause for UPDATE branch is provided for implicit data maintenance. Conditions are specified in the ON clause. MERGE statements do an UPDATE if the row exists and an INSERT if it is a new row. Using MERGE statements avoids multiple updates. – MERGE statements use a single SQL statement to complete an UPDATE or INSERT or both.

Copyright © 2005, Oracle. All rights reserved Data Warehousing MERGE : Example MERGE INTO customer C USING cust_src S ON (c.customer_id = s.src_customer_id) WHEN MATCHED THEN UPDATE SET c.cust_address = s.cust_address WHEN NOT MATCHED THEN INSERT ( Customer_id, cust_first_name,…) VALUES (src_customer_id, src_first_name,…);

Copyright © 2005, Oracle. All rights reserved Data Maintenance with MERGE/DELETE MERGE USING product_changes S INTO products D ON (d.prod_id = s.prod_id) WHEN MATCHED THEN UPDATE SET d.prod_list_price = s.prod_new_price, d.prod_status = s.prod_newstatus DELETE WHERE (d.prod_status = "OBSOLETE") WHEN NOT MATCHED THEN INSERT (prod_id, prod_list_price, prod_status) VALUES (s.prod_id, s.prod_new_price, s.prod_new_status);

Copyright © 2005, Oracle. All rights reserved Overview of Multitable INSERT Statements Allows the INSERT … AS SELECT statement to insert rows into multiple tables as part of a single DML statement Can be used in data warehousing systems to transfer data from one or more operational sources to a set of target tables Types of multitable INSERT statements: –Unconditional INSERT –Pivoting INSERT –Conditional ALL INSERT –Conditional FIRST INSERT

Copyright © 2005, Oracle. All rights reserved Example of Unconditional INSERT INSERT ALL INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;

Copyright © 2005, Oracle. All rights reserved Example of Conditional ALL INSERT INSERT ALL WHEN product_id IN (SELECT product_id FROM promotional_items) THEN INTO promotional_sales VALUES(product_id,list_price) WHEN order_mode = 'online' THEN INTO web_orders VALUES(product_id, order_total) SELECT product_id, list_price, order_total, order_mode FROM orders;

Copyright © 2005, Oracle. All rights reserved Example of Pivoting INSERT INSERT ALL INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed)... SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_summary;

Copyright © 2005, Oracle. All rights reserved Example of Conditional FIRST INSERT INSERT FIRST WHEN order_total > THEN INTO priority_handling VALUES (id) WHEN order_total > 5000 THEN INTO special_handling VALUES (id) WHEN total > 3000 THEN INTO privilege_handling VALUES (id) ELSE INTO regular_handling VALUES (id) SELECT order_total, order_id id FROM orders;

Copyright © 2005, Oracle. All rights reserved Overview of Table Functions A table function is a function that can produce a set of rows as output. –Input can be a set of rows. Table functions support pipelined and parallel execution of transformations using: –PL/SQL –C programming language –Java Table functions are used in the FROM clause of a SELECT statement.

Copyright © 2005, Oracle. All rights reserved Creating Table Functions CREATE OR REPLACE FUNCTION transform(p ref_cur_type) RETURN table_order_items_type PIPELINED PARALLEL_ENABLE( PARTITION p BY ANY) IS BEGIN FOR rec IN p LOOP … -- Transform this record PIPE ROW (rec); END LOOP; RETURN; END;

Copyright © 2005, Oracle. All rights reserved. 6-20

Copyright © 2005, Oracle. All rights reserved Using Table Functions SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext))); INSERT /*+ APPEND, PARALLEL(order_items) */ INTO order_items SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext)));

Copyright © 2005, Oracle. All rights reserved DML Error Logging: Overview Aborting long-running bulk DML operations is wasteful of time and system resources. DML Error Logging allows bulk DML operations to continue processing after a DML error occurs. –Errors are logged to an error-logging table. DML Error Logging combines the power and speed of bulk processing with the functionality of row-level error handling.

Copyright © 2005, Oracle. All rights reserved DML Error Logging Concepts An error-logging table is created and associated with an existing table by using the DBMS_ERRLOG package. The LOG ERRORS INTO... clause is added to the bulk DML load statement providing: –Error-logging table name –Statement tag –Reject limit INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct >.2 LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10; EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');

Copyright © 2005, Oracle. All rights reserved. 6-24

Copyright © 2005, Oracle. All rights reserved Error-Logging Table The error-logging table logically consists of two parts: –Fixed: Columns that describe the error –Variable: Columns that contain data values corresponding to the row in error Column names for the variable part determine the DML table columns logged when an error occurs. The columns of the DML table to be logged is the intersection of the column names of the DML table and the error-logging table.

Copyright © 2005, Oracle. All rights reserved. 6-26

Copyright © 2005, Oracle. All rights reserved Error-Logging Table Format Fixed Portion: Error information columns User-supplied tag value via DML statement varchar2(4000) ORA_ERR_TAG$ Type of operation: (I)nsert, (U)pdate, (D)elete varchar2(2) ORA_ERR_OPTYP$ Row ID of row in error ( UPDATE / DELETE ) row ID ORA_ERR_ROWID$ Oracle error message text varchar2(4000) ORA_ERR_MESG$ Oracle error number varchar2(10) ORA_ERR_NUMBER$ Information Data type Column name

Copyright © 2005, Oracle. All rights reserved Inserting into a Table with Error Logging Example: CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER CHECK(sal > 8000)); EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog'); INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct >.2 LOG ERRORS INTO errlog (’my_bad’) REJECT LIMIT 10; 1 2 3

Copyright © 2005, Oracle. All rights reserved Summary In this lesson, you should have learned how to: Explain transformation using SQL – CREATE TABLE AS SELECT (CTAS) – UPDATE – MERGE –Multitable INSERT Describe transformation using table functions Implement DML Error Logging

Copyright © 2005, Oracle. All rights reserved Practice 6: Overview This practice covers the following topics: Loading data using multitable inserts Loading data using multitable conditional insert Loading data using MERGE Logging DML load errors