Presentation is loading. Please wait.

Presentation is loading. Please wait.

My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer.

Similar presentations


Presentation on theme: "My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer."— Presentation transcript:

1 My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer

2 Agenda WHY do we need an ETL? HOW it works Experience: – task – existing solutions? – problem or Oracle quirk – my solution

3 WHY do we need an ETL? OLTP - in most cases calculation is non-trivial: – SQLs grow in size & complexity – increased maintenance effort – poor SQL performance business values calculation is implemented in most of reports independently - no code reuse: – maintenance effort is multiplied by number of reports – copy-paste-driven development

4 WHY do we need an ETL? OLTP-like RC 46 lines 7 joins 3 levels after RADIO 13 lines 3 tables 1 level Balance / UPL calculation:

5 Streams Propagate changes to RC - replica tables Triggers Collect transaction info Collect changes – change log tables Processing routines Read changes Analyze transaction info Denormalize, calculate additional fields Insert into denormalized tables O! RADIO: Data flow Routines: FLAG – extract rows/entities for each event in transaction, sort MAIN – given event rows, run actions SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap

6 Radio: Data flow CLOG tables FLAG job MAIN job DN tables TX info Action CLDao Dao DNDao

7 PLSQL code generator >600kB of pl/sql code: TX element row  create type as object resulting DN row  create type as object action for each event type  create type as object Code maintenance is pain  use higher level language !! JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql. XML-based DB structure XML-based flag/action mapping power of Java

8 Streams, Triggers & CLOGs after trigger my equals duplicate scn

9 After trigger we keep TX apply state in package variables before trigger is invoked SUDDENLY!, transaction is rolled back – package variables stay altered! Use only after triggers

10 My equals We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ? nvl(:new.val = :old.val, :new.val is null and :old.val is null) Simple inline in JPLSQL.

11 Streams duplicate SCN Ingredients: several sessions several tables Streams replication Apply process can produce message with same SCN. Oracle BUG ID: ???

12 Processing Job control FLAG  MAIN communication MAIN

13 Processing: JOB control identification – how do we know a job is running ? communication – how do we communicate a job ? – dbms_alert has implicit commits – dbms_pipe is not compatible with RAC sleep – conditioned wait

14 Processing: FLAG 250 lines of SQL 300 lines of explain plan 1 kTX p/second

15 Processing: FLAG FLAG computes: – table of table of – table of » number – event types event occasions – table index for this event » TX row id 3-dimensional table problems: ordering (no order by in collect statement in 10g) storing – nested table doesn’t preserve ordering FLAG job MAIN job TX info

16 Processing: MAIN What is different from Java object has default constructors – very useful for bulk creation encapsulation is bad - package method access is slower, than variable access reading from package variable is much, much faster, than reading from tables  cache everything

17 What is very different from Java object/record assignment works by value, not by reference Processing: MAIN

18 Java-like toString: get all object fields using user_source view execute immediate … very useful for debugging

19 Processing: MAIN Tom Kyte’s “when others” rule exception: we really want to catch all kind of errors: – infrastructure logic – business logic constraints – Oracle internal errors we really want to stop after any error

20 Post-processing: Deployer Relieves system engineers from deployment paint Read installation bundle Read DB objects Compute difference Build patch Each object type has it’s own: create / change statement syntax system view structure

21 Post-processing: Deployer Oracle has object dependencies: pl/sql depends on tables tables depend on user types user types depend on their parent types

22 Misc SNAP – Trade/RC scn mapping datatest – xmlforest, emails very slow dbms_output retrieval

23 Questions ?


Download ppt "My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer."

Similar presentations


Ads by Google