Presentation is loading. Please wait.

Presentation is loading. Please wait.

 ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated.

Similar presentations


Presentation on theme: " ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated."— Presentation transcript:

1  ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated regularly  Amount and complexity of data grew dramatically over the past decade so ETL processed became more complex and demanding

2 1. Requirements 2. Analysis 3. Design 4. Proof of Concept 5. Development 6. Testing 7. Execution 8. Verification

3  Scope of the data migration - what data is required in target system?  Execution requirements – has to be within certain timeframe, sequence, geographic location, repeatability, acceptable system down time, etc.  Source data retention period, backup and restore requirements  Requirements should be made with this in mind:  Data is company’s most valuable asset.  Consequences of corrupted data are usually very costly.

4  Understanding the source data  Data Dictionary usable for designing ETL process has to created  Mission critical task  Frequently underestimated (importance and time)  All available resource should be used to do analysis properly:  Available system documentation including Data Model and Data Dictionary  People  Reverse engineering

5  Choice of methodology  Choice of technology  Design Target Database  Design ETL process  Data Mapping Document  Maps source data to target database  Specifies transformation rules  Specifies generated data (not from source)  Design ETL verification process  Ensure that all requirements are addressed

6  Helps to determine or estimate:  Feasibility of the concept  Development time  Performances, capacity and execution time  Requirements could be met  Gain knowledge about the technology  Code produced in this phase usually can be re- used during the development phase

7  Includes:  Produce code and processes as per Design and Data Mapping Document  Data verification scripts or programs as per Test Plan  Execution scripts as per Execution Plan  Unit testing – performed and documented by developers.  Typical Challenges:  Inadequate requirements and design documents  Developers unfamiliar with technology

8  Ensures that requirements are met  Test Plan is highly recommended  Types of testing:  Functional, stress, load, integration, connectivity, regression  Challenges:  Automation and repeatability (testing and verification scripts)  Creation of the Test Data  Extracting small data sets from large data volumes  Confidential data may not be made available for testing

9  Execution plan should include:  Sequence of tasks  Time of execution and expected duration of execution  Checkpoints and success criteria  Back out plan and continuation of business  Resources involved  For mission critical system down time could be limited or even entirely unacceptable  Execution should be controlled and verified

10  Confirms that data migration was successful  Determined during the design phase  Various methodologies and technologies could be used  Automated verifications is highly recommended (driven by requirements)

11  Underestimated complexity of the project  Overlooked or neglected phases of the project  Wrong choice of technology  Common misconceptions like  Expensive ETL tools will solve all problems  No or very little programming will be required  We don't need or we don't have time for plans, but we know exactly what we need to do

12  Maintaining license and consultants is very expensive  Significant time required to learn  Usually require dedicated hardware  Cannot take advantage of database vendor proprietary technologies optimized for fastest data migration  For complex tasks very often requires integration with other technologies  Very limited performances  Only small amount of provided functionality is actually required for ETL project  Very limited application for Data Analysis  Huge discrepancy between marketing promises and actual performances

13  During 15 years in IT Consulting business a proprietary ETL methodology and technology is developed  Consists of tow major modules:  Database Analyzer  G-DAO Framework  Major advantages:  Inexpensive, easier to learn and performs better than mainstream ETL Software  Any Java developer can master it and start using it within several days  It is proven and it works

14  Produces ETL Data Analysis Reports in various formats  Major usage:  Analyze and understand source data and the database attributes  Create data mapping and transformation documents  Create data dictionary  Suggests ways to improve database design  Valuable source of information for Business Analysis, Data Architects, Developers, Database Administrators.

15  Intuitive, descriptive and easy to read  In HTML format  Can be imported and edited in major document editors such as MS Word

16  User friendly GUI Interface  It can also run in Batch mode for lengthy analysis (large data sources)

17  Java Code Generator  Eliminates huge legwork to develop code required for ETL Process  Uses analysis performed by Database Analyzer to produce code optimized for particular database  Code may be used for purposes other than ETL (any kind of database access and data manipulation)  Uses advantage of almost unlimited world of java libraries (no proprietary languages and interfaces)

18  Data can flow directly from source to target (no need to intermittent storage into files)  XA transactions are supported for all major databases  Functionality limited only by limitation of the JDBC driver and Java language  Easy to learn and implement  No dedicated hardware required  Provides platform for any kind of business application that requires data access

19  Incorporated in 1993 in Toronto, Canada  Provided IT Consulting to:  Oracle Corporation  General Electric  Citibank  Royal Bank  Bank of Montreal  The Prudential  Standard Life  and many more  Most recent implementations of Database Analyzer and G- DAO Framework  Citibank  Royal Bank of Canada  http://www.mallocinc.com


Download ppt " ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated."

Similar presentations


Ads by Google