Download presentation
Presentation is loading. Please wait.
Published byVirgil Armstrong Modified over 8 years ago
1
1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health
2
2 Ben Bor Over 20 years in IT, most of it in Information Management Oracle specialist since version 5 Involved in Business Intelligence for over 10 years Consulted the world’s largest corporations Presents regularly on Information Management Was annual Guest Lecturer at Sussex University
3
3 Contents What is ETL ETL tools vs. ‘handcraft’ code PL/SQL techniques
4
4 What is ETL ETL = Extract, Transform and Load: Any source, target ; Built-in complex transformations Point-to-point vs. hub-and-spoke
5
5 Traditional ETL
6
6 Our Own ETL Requirements Flat Files SQL Loader PL/SQL Data Quality
7
7 Travel Company Example
8
8 Tools or Handcraft? ETL Advantages: Graphic User Interface Automatic documentation Off-the-shelf set of ready-to- use transformations Built-in scheduler Database Agnostic Handcrafting Advantages: No limitation reuse existing code & non ETL No specific methodology No license cost No impact on infrastructure Transportable Release & Code- Management by script
9
9 Oracle ETL Facilities External Tables Merge SQL Loader PL/SQL Database links
10
10 Why Use PL/SQL Integrated environment (no installation required) Available resources Reuse code ‘snippets’ Good performance Integration with and control of the database
11
11 PL/SQL Tips and Techniques 1.Quality 2.Techniques 3.Tricks
12
12 Quality
13
13 What is Quality? [1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“ [The ISO 8204 definition for quality]
14
14 Quality 2 [2] Quality is a collection of “ilities”: Reliability - operate error free Modifiability- have enhancement changes made easily Understandability - understand the software readily Efficiency - the speed of the software Usability - use the software easily Testability - construct and execute test cases easily Portability - transport the software easily
15
15 Quality 3 [3] “All the things you do today in your software development, in order to bear fruit in the future.”
16
16 Standards & Conventions Use meaningful names V_Number_Of_Items_In_Array vs. i or no_itms Distinguish between types: V_Variable a_Parameter C_Constant G_Global constant
17
17 Using Packages Central package with utilities and all output All error messages and numbers All common constants (date format etc’) Global variables Statistics data Other packages encapsulate related logic Within package: Procedures & functions have: Meaningful name A99_ prefix. A is the level (A highest). 99 unique ID
18
18 Example: procedure and variable naming XXX_Write_Flat_File.U03_Write_Record_To_CSV( a_File_Handle, C_Field_Delim, C_Field_Separ, C_Record_Separ, RM_REFERENCE_rec.REFTYPE, RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION, To_Char(RM_REFERENCE_rec.ISDEFAULT, '9') ) ;
19
19 Techniques Error logging Autonomous Transaction Run statistics Release mechanism Overloading
20
20 Error Logging Technique Global variables keep key information: Record ID Run ID Location in code Local error trapping decides severity and error code. All error trapping passed up.
21
21 Error Logging Structure TABLE ERROR_LOG(ERR_TIMEDATE, ERR_NUMINTEGER, SOURCE_URNVARCHAR2(20), SOURCE_SYSTEM_IDVARCHAR2(5), PLACE_IN_CODEVARCHAR2(64), ERR_LOCATIONVARCHAR2(255), ERR_DESCRIPTIONVARCHAR2(512), SEVERITY NUMBER(6) ) ERR_TIME18-OCT-02 10:04:52 ERR_NUM1001 SOURCE_URN223010913 SOURCE_SYSTEMCRS PLACE_IN_CODEIn FLIP_PKG B06 ; 6(utils A08) ERR_LOCATIONA08_Lookup_Type ERR_DESCRIPTIONNo match found for [Plan_Code] value [C3] SEVERITY10
22
22 --=================== PROCEDURE E00_write_error_log( --=================== a_err_numINinteger, a_SeverityINInteger, a_err_locationINVarChar, a_err_descriptionINVarChar) IS PRAGMA AUTONOMOUS_TRANSACTION; V_Place_In_CodeDW_Process.Error_Log.Place_In_Code%Type; BEGIN V_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ; INSERT INTO DW_Process.Error_Log ( err_time,err_num, Severity, BOROUGH_ID,SOURCE_URN,SOURCE_SYSTEM_ID, Place_In_Code,err_location,err_description ) VALUES ( sysdate,a_err_num,a_Severity, G_BOROUGH_ID,G_SOURCE_URN,G_SOURCE_SYSTEM_ID, V_Place_In_Code,a_err_location,a_err_description ) ; COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected. G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ; --=================== ENDE00_Write_Error_Log ; --=================== Autonomous Transaction
23
23 Run Statistics G_Stats_Rec is a record with all the statistics fields Defined in the central package ( therefore resident in memory ) It is updated by the writing procedures (all central) It is written out at the end of the run
24
24 Release Mechanism Table of ‘release notes’ Each package has C_Version constant updated each release ‘Show_Version’ scripts display versions and notes Results shipped with each release
25
25 Remove Spaces --=================== FUNCTIONA04_Remove_Spaces( --=================== a_InstringIN Varchar ) Return Varchar IS /* ** Removes all the spaces from a string, leaving the rest of the printable characters */ BEGIN G_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine RETURN TRANSLATE( a_Instring, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|, /?#~@;:[{]}=+-_`¬!"£$%^&*() ', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|, /?#~@;:[{]}=+-_`¬!"£$%^&*()' ) ; --=================== ENDA04_Remove_Spaces ; --===================
26
26 Strip Leading non-numerics --============================ FUNCTIONF09_Strip_Leading_non_digits( --============================ a_StringIN VARCHAR2 ) RETURN VARCHAR2 IS /* ** Remove leading non-digits from the input. ** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */ v_string Varchar2(4000) ; v_first_digit_posInteger ; BEGIN -- Replace all digits by 0 v_string := Translate(a_String, '1234567890', '0000000000') ; v_first_digit_pos := instr(v_string,'0') ; RETURN F01_Right(a_String, v_first_digit_pos ) ; --============================ ENDF09_Strip_Leading_non_digits; --============================
27
27 Overloading --======================= PROCEDUREU03_Write_Record_To_CSV( --======================= a_File_HandleINutl_file.file_type, a_Field_DelimINVarChar, -- the quotes, for CSV a_Field_SeparINVarChar, -- the comma, for CSV a_Record_SeparINVarChar, -- the Carriage Return + Line feed, for CSV a_String1INVarChar := G_default_Value, a_String2INVarChar := G_default_Value, a_String3INVarChar := G_default_Value,. ) IS BEGIN IF a_String1 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ; IF a_String2 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ; IF a_String3 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;. > U01_Write_Line(a_File_Handle, a_Record_Separ) ; --======================= ENDU03_Write_Record_To_CSV ; --------------------------------------------------------------------------------------------------------------------------------------------------------------- --=======================
28
28 Summary ETL or PL/SQL? Your choice. Consider: Overall cost ‘Politics’ Convenience Portability Speed of development Reusability IF PL/SQL : ensure Quality
29
29 Thank you !
30
30
31
31 Thank you ! I can be contacted at ben_bor@moh.govt.nz
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.