Java/XML ETL Engine By Bob Timlin. Outline Data Extraction, Transformation, and Loading (ETL). Java & XML Meta-Data Mapping Data from Source to Target.

Slides:



Advertisements
Similar presentations
An Object/Relational Mapping tool Free and open source Simplifies storage of object data in a relational database Removes the need to write and maintain.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Introduction to OWB(Oracle Warehouse Builder)
C6 Databases.
The database approach to data management provides significant advantages over the traditional file-based approach Define general data management concepts.
Technical BI Project Lifecycle
Data Quality Class 5. Goals Project Data Quality Rules (Continued) Example Use of Data Quality Rules.
Object-Oriented Application Development Using VB.NET 1 Chapter 13 Introduction to Data Access Classes and Persistence.
1 C. Shahabi Application Programming for Relational Databases Cyrus Shahabi Computer Science Department University of Southern California
DT228/3 Web Development Databases. Database Almost all web application on the net access a database e.g. shopping sites, message boards, search engines.
Data Warehouse success depends on metadata
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Pan-European infrastructure for Ocean & Marine Data management An EU Integrated research Infrastructure Initiative (I3) MIKADO : Java tool for XML Creation.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT 1 ETL PROCESS (Muscat, Oman)
Lecture The Client/Server Database Environment
Session-01. Hibernate Framework ? Why we use Hibernate ?
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
5 Copyright © 2009, Oracle. All rights reserved. Defining ETL Mappings for Staging Data.
Leaving a Metadata Trail Chapter 14. Defining Warehouse Metadata Data about warehouse data and processing Vital to the warehouse Used by everyone Metadata.
ETL Design and Development Michael A. Fudge, Jr.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
 ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated.
Data Warehouse Tools and Technologies - ETL
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Data-mining & Data As we used Excel that has capability to analyze data to find important information, the data-mining helps us to extract information.
M1G Introduction to Database Development 6. Building Applications.
2005 SPRING CSMUIntroduction to Information Management1 Organizing Data John Sum Institute of Technology Management National Chung Hsing University.
More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from.
Object-Oriented Frameworks for Migrating Structured Data April 2004.
1 Publication of C Data Warehouse Code 17/11/2002 – Today I am pleased to announce the publication of a suite of C code which has been used to load large.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
Data Management Console Synonym Editor
Soup-2-Nuts Alaska Department of Fish & Game Commercial Fisheries October, 2011.
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Chapter 14 - Designing Data Access Classes1 Chapter 14 Designing Data Access Classes.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
Oracle Data Integrator Agents. 8-2 Understanding Agents.
12/6/2015B.Ramamurthy1 Java Database Connectivity B.Ramamurthy.
7 Strategies for Extracting, Transforming, and Loading.
9 Copyright © 2009, Oracle. All rights reserved. Deploying and Reporting on ETL Jobs.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
3 Copyright © 2009, Oracle. All rights reserved. Understanding the Warehouse Builder Architecture.
WEB SERVER SOFTWARE FEATURE SETS
Two-Tier DW Architecture. Three-Tier DW Architecture.
Basics of JDBC Session 14.
RoOUG Iunie Bucuresti, 26 Iunie Agenda Inregistrarea participantilor ODI – Common Use Cases 2Iunie 2013.
Product Description. XML file generation Fluidity in data transfer. Just-in-time integration and transformation Based on JAVA technology. Output formats.
SQL pepper. Why SQL File I/O is a great deal of code Optimal file organization and indexing is critical and a great deal of code and theory implementation.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
SAP BODS Online Training and Placement in USA Online | classroom| Corporate Training | certifications | placements| support CONTACT US: MAGNIFIC TRAINING.
16 Copyright © 2004, Oracle. All rights reserved. Testing the Migrated Oracle Database.
Slide 1 © 2016, Lera Technologies. All Rights Reserved. Oracle Data Integrator By Lera Technologies.
©NIIT BCP and DTS Implementing Stored Procedures Lesson 2A / Slide 1 of 23 Objectives In this lesson, you will learn to: Perform bulk copy using the BCP.
Data Integration - The ETL Process Module 4: BIC#4 – Data Integration Capability Populating Data Warehouse (Data Mart) 1.
ETL Design - Stage Philip Noakes May 9, 2015.
DEPTT. OF COMP. SC & APPLICATIONS
Overview of MDM Site Hub
A very brief introduction
Incrementally Moving to the Cloud Using Biml
JDBC.
tRelational/DPS Overview
Java Database Connectivity
Storing and Processing Sensor Networks Data in Public Clouds
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Java/XML ETL Engine By Bob Timlin

Outline Data Extraction, Transformation, and Loading (ETL). Java & XML Meta-Data Mapping Data from Source to Target

Outline Proposed XML Usage. XML for Meta-Data Challenges/Issues Sample XML Data File Sample XML Meta-Data File

Extract/Transform/Load (ETL): The process of getting data from the source system(s) into the data- warehouse is easily 80% of the effort of the entire data-warehouse. This is because of the complexity of the source systems, the cleansing or transformation process, and all of the prep work to get the detail operational data into summary data-warehouse data. The more the source systems you have the harder this process is and this increases exponentially. Cleaning/Transforming the data is probably the most complicated part of this process. Transformations can either be done on the source system or the target system.

Java & XML Currently ETL processes are mostly written in Cobol and C with Embedded SQL. There are many GUI tools out there to streamline this process. These tools mostly generate proprietary code that is then executed by an scheduling program. All of the big vendors in this field are pushing XML as a language to store transformation meta-data and all of the big plays, sans Microsoft, are backing Java as the language to implement transformations. For some weird reason Microsoft doesn’t seem to like Java. The Major Vendors include: IBM, Oracle, and Microsoft.

Meta-Data Data about data. In terms of data warehouse it stores information about the structures of both source and destination data and how to extract, transform, and load data. It may also maintain network configuration information like ip-addresses and ports. The meta-data coalition recently merged with Object Management Group (OMG) They are backed by many heavy-hitters including Oracle, IBM, and Microsoft. The industry seems to be moving towards using XML for storing meta-data. This makes the meta-data very standardized and portable.

Mapping Data from Source to Target: Target: Name: The name of the logical table in the data-warehouse. Source: table name in the xml data file. Driver: JDBC driver name Url: Path to the data-warehouse. Username: username to connect to the data-warehouse Password: password to connect to the data-warehouse

Mapping (continued) Column: Name: The name of the logical column in the dw. Type: The data type of the logical column in the data warehouse. Key: Is this a primary key, if so the engine will use it in the where clause. Source: The name of the column in the xml data file

Proposed XML Usage For meta-data about the ETL processing. This will contain all information about mapping source to target, including transformation rules. As a data-file to store data from database’s.

XML for Meta-Data The specification is designed to be flexible enough to support many protocols, however for our project we will only implement two protocols. 1. XML Data File, 2. JDBC The Protocol will be part of the url attribute of the target or source node. Every transformation will have a source and target. … <target driver="oracle.jdbc.driver.OracleDriver" username=“scott" password=“tiger" name="srctest">

The basic construct of a XML meta-data file is: [ ] [ [ ] ] [ ] [ [ ] ]

Challenges/Issues Mapping multiple sources to multiple targets. Transformations can involve very complex coding. Especially eliminating duplicates, merging, and purging of data. These transformations usually involve “fuzzy” logic.

<target driver="oracle.jdbc.driver.OracleDriver" username=“scott" password=“tiger" name="srctest"> <! As the target, connect to the database using JDBC and Insert the data from the source XML file and rules that follow> source.replace("'", "") INITCAP(SUBSTR(source, 1, INSTR(source, ',') -1))

source.replace("'", "") INITCAP(SUBSTR(source, INSTR(source, ',') +1)) TO_DATE(source, 'DD/MM/YYYY')

<source driver="oracle.jdbc.driver.OracleDriver" username=“scott" password=“tiger" name=“targetTest"> source.replace("'", "") INITCAP(SUBSTR(source, 1, INSTR(source, ',') -1))

source.replace("'", "") INITCAP(SUBSTR(source, INSTR(source, ',') +1)) TO_DATE(source, 'DD/MM/YYYY')

data for column 1 data for column 2 data for column 1 data for column 2 data for column 1 data for column 2 Sample XML Data File

FRFSFEM1E

1 S 17 EXCEPT FOR TRAUMA]]>

Sample XML Meta-Data <target name="admits" source="patient" driver="org.gjt.mm.mysql.Driver" url="jdbc:mysql://localhost:3306/test" username="test" password="">

Thank You