BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.

Slides:



Advertisements
Similar presentations
Refreshing Materialized Views
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Alternative Database topology: The star schema
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
Dimensional Modeling – Part 2
Introduction to Structured Query Language (SQL)
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Copying, Managing, and Transforming Data With DTS.
ETL Design and Development Michael A. Fudge, Jr.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Chapter 4: Organizing and Manipulating the Data in Databases
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 4: Organizing and Manipulating the Data in Databases
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Oracle Data Integrator Transformations: Adding More Complexity
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
MIS2502: Data Analytics The Information Architecture of an Organization.
6 Extraction, Transformation, and Loading (ETL) Transformation.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
MIS2502: Data Analytics Dimensional Data Modeling
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
6 Copyright © 2009, Oracle. All rights reserved. Using the Data Transformation Operators.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
IS 380 Introduction to SQL This lectures covers material from: database textbook chapter 3 Oracle chapter: 3,14,17.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
ViaSQL Transfer. Viaserv, Inc. Transfer – 2 The ViaSQL Transfer n Available only with ViaSQL Integrator n Move data between OS/390 and a LAN database.
INCREMENTAL AGGREGATION After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation.
CSC314 DAY 8 Introduction to SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SQL OVERVIEW  Structured Query Language  The.
 CONACT UC:  Magnific training   
An Overview of Data Warehousing and OLAP Technology
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Copyright © 2006, Oracle. All rights reserved. Czinkóczki László oktató Using the Oracle Warehouse Builder.
1. Advanced SQL Functions Procedural Constructs Triggers.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Extending and Creating Dynamics AX OLAP Cubes
Plan for Populating a DW
View Integration and Implementation Compromises
Data warehouse and OLAP
Using Partitions and Fragments
Inventory is used to illustrate:
CMPE 226 Database Systems April 11 Class Meeting
Databases & Consistency
Creating Noninput Items
Introduction of Week 9 Return assignment 5-2
Contents Preface I Introduction Lesson Objectives I-2
Relational Database Design
Joins and other advanced Queries
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading

Transformation Flow From an architectural perspective, you can transform your data in two ways: ■ Multistage Data Transformation ■ Pipelined Data Transformation LAB EXERCISE #4 Oracle Data Warehousing

Multistage Data Transformation The data transformation logic for most data warehouses consists of multiple steps. For example, in transforming new records to be inserted into a sales table, there may be separate logical transformation steps to validate each dimension key. LAB EXERCISE #4 Oracle Data Warehousing

Pipelined Data Transformation LAB EXERCISE #4 Oracle Data Warehousing

Loading Mechanisms You can use the following mechanisms for loading a data warehouse: ■ Loading a Data Warehouse with SQL*Loader ■ Loading a Data Warehouse with External Tables ■ Loading a Data Warehouse with OCI and Direct-Path APIs ■ Loading a Data Warehouse with Export/Import LAB EXERCISE #4 Oracle Data Warehousing

■ Loading a Data Warehouse with SQL*Loader LAB EXERCISE #4 Oracle Data Warehousing

Transformation Mechanisms You have the following choices for transforming data inside the database: ■ Transforming Data Using SQL ■ Transforming Data Using PL/SQL ■ Transforming Data Using Table Functions Transforming Data Using SQL Once data is loaded into the database, data transformations can be executed using SQL operations. There are four basic techniques for implementing SQL data transformations: ■ CREATE TABLE... AS SELECT And INSERT /*+APPEND*/ AS SELECT (Data substitution) ■ Transforming Data Using UPDATE (Data substitution) ■ Transforming Data Using MERGE ■ Transforming Data Using Multitable INSERT LAB EXERCISE #4 Oracle Data Warehousing

CREATE TABLE... AS SELECT And INSERT /*+APPEND*/ AS SELECT The CREATE TABLE... AS SELECT statement (CTAS) is a powerful tool for efficiently executing a SQL query and storing the results of that query in a new database table. The INSERT /*+APPEND*/... AS SELECT statement offers the same capabilities with existing database tables. The following SQL statement inserts data from sales_activity_direct into the sales table of the sample schema, using a SQL function to truncate the sales date values to the midnight time and assigning a fixed channel ID of 3. INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales SELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amount FROM sales_activity_direct; LAB EXERCISE #4 Oracle Data Warehousing Note: receiving data from multiple source systems for your data warehouse.

Transforming Data Using UPDATE Another technique for implementing a data substitution is to use an UPDATE statement to modify the sales.channel_id column. An UPDATE will provide the correct result. LAB EXERCISE #4 Oracle Data Warehousing

Transforming Data Using MERGE Oracle Database's merge functionality extends SQL, by introducing the SQL keyword MERGE, in order to provide the ability to update or insert a row conditionally into a table or out of line single table views. Example: assume that new data for the dimension table products is propagated to the data warehouse and has to be either inserted or updated. The table products_delta has the same structure as products. Merge Operation Using SQL LAB EXERCISE #4 Oracle Data Warehousing

Transforming Data Using Multitable INSERT Many times, external data sources have to be segregated based on logical attributes for insertion into different target objects. It offers the benefits of the INSERT... SELECT statement when multiple tables are involved as targets. LAB EXERCISE #4 Oracle Data Warehousing

Example (Unconditional Insert) The following statement aggregates the transactional sales information, stored in sales_activity_direct, on a daily basis and inserts into both the sales and the costs fact table for the current day. INSERT ALL INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) SELECT TRUNC (s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price; LAB EXERCISE #4 Oracle Data Warehousing

Example (Conditional ALL Insert) The following statement inserts a row into the sales and costs tables for all sales transactions with a valid promotion and stores the information about multiple identical orders of a customer in a separate table cum_sales_activity. It is possible two rows will be inserted for some sales transactions, and none for others. INSERT ALL WHEN promotion_id IN (SELECT promo_id FROM promotions) THEN INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) WHEN num_of_orders > 1 THEN INTO cum_sales_activity VALUES (today, product_id, customer_id, promotion_id, quantity_per_day, amount_per_day, num_of_orders) SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, COUNT(*) num_of_orders, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price; LAB EXERCISE #4 Oracle Data Warehousing

Conditional FIRST Insert The following statement inserts into an appropriate shipping manifest according to the total quantity and the weight of a product order. An exception is made for high value orders, which are also sent by express, unless their weight classification is not too high. All incorrect orders, in this simple example represented as orders without a quantity, are stored in a separate table. It assumes the existence of appropriate tables large_freight_shipping, express_shipping, default_shipping, and incorrect_sales_order. INSERT FIRST WHEN (sum_quantity_sold > 10 AND prod_weight_class < 5) AND sum_quantity_sold >=1) OR (sum_quantity_sold > 5 AND prod_weight_class > 5) THEN INTO large_freight_shipping VALUES (time_id, cust_id, prod_id, prod_weight_class, sum_quantity_sold) WHEN sum_amount_sold > 1000 AND sum_quantity_sold >=1 THEN INTO express_shipping VALUES (time_id, cust_id, prod_id, prod_weight_class, sum_amount_sold, sum_quantity_sold) WHEN (sum_quantity_sold >=1) THEN INTO default_shipping VALUES (time_id, cust_id, prod_id, sum_quantity_sold) ELSE INTO incorrect_sales_order VALUES (time_id, cust_id, prod_id) SELECT s.time_id, s.cust_id, s.prod_id, p.prod_weight_class, SUM(amount_sold) AS sum_amount_sold, SUM(quantity_sold) AS sum_quantity_sold FROM sales s, products p WHERE s.prod_id = p.prod_id AND s.time_id = TRUNC(SYSDATE) GROUP BY s.time_id, s.cust_id, s.prod_id, p.prod_weight_class; LAB EXERCISE #4 Oracle Data Warehousing

Example (Mixed Conditional and Unconditional Insert) The following example inserts new customers into the customers table and stores all new customers with cust_credit_limit higher then 4500 in an additional, separate table for further promotions. INSERT FIRST WHEN cust_credit_limit >= 4500 THEN INTO customers INTO customers_special VALUES (cust_id, cust_credit_limit) ELSE INTO customers SELECT * FROM customers_new; LAB EXERCISE #4 Oracle Data Warehousing