My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer.

Slides:



Advertisements
Similar presentations
PL/SQL. Introduction to PL/SQL PL/SQL is the procedure extension to Oracle SQL. It is used to access an Oracle database from various environments (e.g.
Advertisements

ERWin Template Overview By: Dave Wentzel. Agenda u Overview of Templates/Macros u Template editor u Available templates u Independent column browser u.
Chapter 4B: More Advanced PL/SQL Programming
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Attribute databases. GIS Definition Diagram Output Query Results.
1 Tuning PL/SQL procedures using DBMS_PROFILER 20-August 2009 Tim Gorman Evergreen Database Technologies, Inc. Northern California Oracle.
PL/SQL Bulk Collections in Oracle 9i and 10g Kent Crotty Burleson Consulting October 13, 2006.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
PL / SQL P rocedural L anguage / S tructured Q uery L anguage Chapter 7 in Lab Reference.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 8 Advanced SQL.
Oracle9 i JDeveloper for Database Developers and DBAs Brian Fry Principal Product Manager Oracle JDeveloper Oracle Corporation.
Oracle PL/SQL Programming Steven Feuerstein All About the (Amazing) Function Result Cache of Oracle Database 11g.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
SQL Server 7.0 Maintaining Referential Integrity.
The Self-Managing Database: Guided Application and SQL Tuning Mohamed Ziauddin Consulting Member of Technical Staff Oracle Corporation Session id:
Materialized Views. 2 Materialized Views – Agenda What is a Materialized View? – Advantages and Disadvantages How Materialized Views Work – Parameter.
1099 Why Use InterBase? Bill Todd The Database Group, Inc.
Oracle PL/SQL Practices. Critical elements of PL/SQL Best Practices Build your development toolbox Unit test PL/SQL programs Optimize SQL in PL/SQL programs.
Stored procedures1 Stored procedures and functions Procedures and functions stored in the database.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
By Shanna Epstein IS 257 September 16, Cnet.com Provides information, tools, and advice to help customers decide what to buy and how to get the.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 8 Advanced SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 6 Procedural Language SQL and Advanced SQL Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Chapter 15 Introduction to PL/SQL. Chapter Objectives  Explain the benefits of using PL/SQL blocks versus several SQL statements  Identify the sections.
3 3 Chapter 3 Structured Query Language (SQL) Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
Data Driven Designs 99% of enterprise applications operate on database data or at least interface databases. Most common DBMS are Microsoft SQL Server,
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Commercial RDBMSs Access and Oracle. Access DBMS Architchecture  Can be used as a standalone system on a single PC: -JET Engine -Microsoft Data Engine.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Intersession Communication Oracle Database PL/SQL 10g Programming
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
A Guide to SQL, Eighth Edition Chapter Eight SQL Functions and Procedures.
Session 1 Module 1: Introduction to Data Integrity
Commercial RDBMSs: Office Access and Oracle Pertemuan 13 Matakuliah: M0564 /Pengantar Sistem Basis Data Tahun : 2008.
A Guide to SQL, Eighth Edition Chapter Six Updating Data.
Relational Database Management System(RDBMS) Structured Query Language(SQL)
Oracle10g Developer: PL/SQL Programming1 Objectives Named program units How to identify parameters The CREATE PROCEDURE statement Creating a procedure.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Creating Simple and Parallel Data Loads With DTS.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
1 11g NEW FEATURES ByVIJAY. 2 AGENDA  RESULT CACHE  INVISIBLE INDEXES  READ ONLY TABLES  DDL WAIT OPTION  ADDING COLUMN TO A TABLE WITH DEFAULT VALUE.
Chapter 8 Advanced SQL. Relational Set Operators UNIONINTERSECTMINUS Work properly if relations are union- compatible –Names of relation attributes must.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
SQL Triggers, Functions & Stored Procedures Programming Operations.
 CONACT UC:  Magnific training   
CS422 Principles of Database Systems Stored Procedures and Triggers Chengyu Sun California State University, Los Angeles.
Preface IIntroduction Course Objectives I-2 Oracle Complete Solution I-3 Course Agenda I-4 Tables Used in This Course I-5 The Order Entry Schema I-6 The.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Planning a Migration.
COMP 430 Intro. to Database Systems
Top 50 SQL Interview Questions & Answers
SQL Stored Triggers Presented by: Dr. Samir Tartir
PL/SQL Scripting in Oracle:
Advanced PL/SQL Programing
Top Tips for Better TSQL Stored Procedures
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
MATERI PL/SQL Procedures Functions Packages Database Triggers
Chapter 11 Managing Databases with SQL Server 2000
Indexes and more Table Creation
Database SQL.
Presentation transcript:

My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer

Agenda WHY do we need an ETL? HOW it works Experience: – task – existing solutions? – problem or Oracle quirk – my solution

WHY do we need an ETL? OLTP - in most cases calculation is non-trivial: – SQLs grow in size & complexity – increased maintenance effort – poor SQL performance business values calculation is implemented in most of reports independently - no code reuse: – maintenance effort is multiplied by number of reports – copy-paste-driven development

WHY do we need an ETL? OLTP-like RC 46 lines 7 joins 3 levels after RADIO 13 lines 3 tables 1 level Balance / UPL calculation:

Streams Propagate changes to RC - replica tables Triggers Collect transaction info Collect changes – change log tables Processing routines Read changes Analyze transaction info Denormalize, calculate additional fields Insert into denormalized tables O! RADIO: Data flow Routines: FLAG – extract rows/entities for each event in transaction, sort MAIN – given event rows, run actions SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap

Radio: Data flow CLOG tables FLAG job MAIN job DN tables TX info Action CLDao Dao DNDao

PLSQL code generator >600kB of pl/sql code: TX element row  create type as object resulting DN row  create type as object action for each event type  create type as object Code maintenance is pain  use higher level language !! JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql. XML-based DB structure XML-based flag/action mapping power of Java

Streams, Triggers & CLOGs after trigger my equals duplicate scn

After trigger we keep TX apply state in package variables before trigger is invoked SUDDENLY!, transaction is rolled back – package variables stay altered! Use only after triggers

My equals We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ? nvl(:new.val = :old.val, :new.val is null and :old.val is null) Simple inline in JPLSQL.

Streams duplicate SCN Ingredients: several sessions several tables Streams replication Apply process can produce message with same SCN. Oracle BUG ID: ???

Processing Job control FLAG  MAIN communication MAIN

Processing: JOB control identification – how do we know a job is running ? communication – how do we communicate a job ? – dbms_alert has implicit commits – dbms_pipe is not compatible with RAC sleep – conditioned wait

Processing: FLAG 250 lines of SQL 300 lines of explain plan 1 kTX p/second

Processing: FLAG FLAG computes: – table of table of – table of » number – event types event occasions – table index for this event » TX row id 3-dimensional table problems: ordering (no order by in collect statement in 10g) storing – nested table doesn’t preserve ordering FLAG job MAIN job TX info

Processing: MAIN What is different from Java object has default constructors – very useful for bulk creation encapsulation is bad - package method access is slower, than variable access reading from package variable is much, much faster, than reading from tables  cache everything

What is very different from Java object/record assignment works by value, not by reference Processing: MAIN

Java-like toString: get all object fields using user_source view execute immediate … very useful for debugging

Processing: MAIN Tom Kyte’s “when others” rule exception: we really want to catch all kind of errors: – infrastructure logic – business logic constraints – Oracle internal errors we really want to stop after any error

Post-processing: Deployer Relieves system engineers from deployment paint Read installation bundle Read DB objects Compute difference Build patch Each object type has it’s own: create / change statement syntax system view structure

Post-processing: Deployer Oracle has object dependencies: pl/sql depends on tables tables depend on user types user types depend on their parent types

Misc SNAP – Trade/RC scn mapping datatest – xmlforest, s very slow dbms_output retrieval

Questions ?