Download presentation
Presentation is loading. Please wait.
Published byDale Griffin Modified over 9 years ago
1
My experience building a custom ETL system Problems, solutions and Oracle quirks or How scary Oracle can look for a Java developer
2
Agenda WHY do we need an ETL? HOW it works Experience: – task – existing solutions? – problem or Oracle quirk – my solution
3
WHY do we need an ETL? OLTP - in most cases calculation is non-trivial: – SQLs grow in size & complexity – increased maintenance effort – poor SQL performance business values calculation is implemented in most of reports independently - no code reuse: – maintenance effort is multiplied by number of reports – copy-paste-driven development
4
WHY do we need an ETL? OLTP-like RC 46 lines 7 joins 3 levels after RADIO 13 lines 3 tables 1 level Balance / UPL calculation:
5
Streams Propagate changes to RC - replica tables Triggers Collect transaction info Collect changes – change log tables Processing routines Read changes Analyze transaction info Denormalize, calculate additional fields Insert into denormalized tables O! RADIO: Data flow Routines: FLAG – extract rows/entities for each event in transaction, sort MAIN – given event rows, run actions SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap
6
Radio: Data flow CLOG tables FLAG job MAIN job DN tables TX info Action CLDao Dao DNDao
7
PLSQL code generator >600kB of pl/sql code: TX element row create type as object resulting DN row create type as object action for each event type create type as object Code maintenance is pain use higher level language !! JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql. XML-based DB structure XML-based flag/action mapping power of Java
8
Streams, Triggers & CLOGs after trigger my equals duplicate scn
9
After trigger we keep TX apply state in package variables before trigger is invoked SUDDENLY!, transaction is rolled back – package variables stay altered! Use only after triggers
10
My equals We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ? nvl(:new.val = :old.val, :new.val is null and :old.val is null) Simple inline in JPLSQL.
11
Streams duplicate SCN Ingredients: several sessions several tables Streams replication Apply process can produce message with same SCN. Oracle BUG ID: ???
12
Processing Job control FLAG MAIN communication MAIN
13
Processing: JOB control identification – how do we know a job is running ? communication – how do we communicate a job ? – dbms_alert has implicit commits – dbms_pipe is not compatible with RAC sleep – conditioned wait
14
Processing: FLAG 250 lines of SQL 300 lines of explain plan 1 kTX p/second
15
Processing: FLAG FLAG computes: – table of table of – table of » number – event types event occasions – table index for this event » TX row id 3-dimensional table problems: ordering (no order by in collect statement in 10g) storing – nested table doesn’t preserve ordering FLAG job MAIN job TX info
16
Processing: MAIN What is different from Java object has default constructors – very useful for bulk creation encapsulation is bad - package method access is slower, than variable access reading from package variable is much, much faster, than reading from tables cache everything
17
What is very different from Java object/record assignment works by value, not by reference Processing: MAIN
18
Java-like toString: get all object fields using user_source view execute immediate … very useful for debugging
19
Processing: MAIN Tom Kyte’s “when others” rule exception: we really want to catch all kind of errors: – infrastructure logic – business logic constraints – Oracle internal errors we really want to stop after any error
20
Post-processing: Deployer Relieves system engineers from deployment paint Read installation bundle Read DB objects Compute difference Build patch Each object type has it’s own: create / change statement syntax system view structure
21
Post-processing: Deployer Oracle has object dependencies: pl/sql depends on tables tables depend on user types user types depend on their parent types
22
Misc SNAP – Trade/RC scn mapping datatest – xmlforest, emails very slow dbms_output retrieval
23
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.