 ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated.

Slides:



Advertisements
Similar presentations
Practical Database Design Methodology and Use of UML Diagrams
Advertisements

Configuration management
Configuration management
Test process essentials Riitta Viitamäki,
Database Planning, Design, and Administration
CSC271 Database Systems Lecture # 18. Summary: Previous Lecture  Transactions  Authorization  Authorization identifier, ownership, privileges  GRANT/REVOKE.
Trnsport Test Suite Project Tony Compton, Texas DOT Charles Engelke, Info Tech.
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
Database Planning, Design, and Administration Transparencies
Managing Data Resources
CATEGORIES OF INFORMATION There are three main categories of business information,and these are related to the purpose for which the information is utilized.
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 8: Designing and developing applications for z/OS.
Database Administration
Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition.
Chapter 1 Assuming the Role of the Systems Analyst
1 Case Study: Starting the Student Registration System Chapter 3.
Lecture Nine Database Planning, Design, and Administration
Software Test Plan Why do you need a test plan? –Provides a road map –Provides a feasibility check of: Resources/Cost Schedule Goal What is a test plan?
Understanding of Automation Framework A Storehouse of Vast Knowledge on Software Testing and Quality Assurance.
The Premier Software Usage Analysis and Reporting Toolset Maximizing Value for Software Users.
NYC Technology Forum Introduction to Test Automation 11/2/07 All rights reserved Not to be reproduced without permission Bill Rinko-Gay Solutions Director,
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 7: Designing and developing applications for z/OS.
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
1.Database plan 2.Information systems plan 3.Technology plan 4.Business strategy plan 5.Enterprise analysis Which of the following serves as a road map.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database Planning, Design, and Administration Transparencies
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Copyright 2003 Accenture. All rights reserved. Accenture, its logo, and Accenture Innovation Delivered are trademarks of Accenture. Data Migration in Oracle.
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
15.1 Introduction Test execution is situated on the critical path to product introduction. Test automation is used, for instance, to minimize the time.
Quality Attributes of Web Software Applications – Jeff Offutt By Julia Erdman SE 510 October 8, 2003.
ITEC224 Database Programming
ITEC 3220M Using and Designing Database Systems
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Winrunner Usage - Best Practices S.A.Christopher.
1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Configuration Management (CM)
Database Planning, Design, and Administration Transparencies
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
CASE1 Computer-Aided Software Engineering Advanced Software Engineering COM360 University of Sunderland © 2000.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Chapter 1 Introduction to Systems Design and Analysis Systems Analysis and Design Kendall and Kendall Sixth Edition.
7 Strategies for Extracting, Transforming, and Loading.
SoftwareServant Pty Ltd 2009 SoftwareServant ® Using the Specification-Only Method.
Unit 17: SDLC. Systems Development Life Cycle Five Major Phases Plus Documentation throughout Plus Evaluation…
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Software Test Plan Why do you need a test plan? –Provides a road map –Provides a feasibility check of: Resources/Cost Schedule Goal What is a test plan?
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
© 2006 Epiance, Inc. Confidential and Proprietary 1.
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
BA Continuum India Pvt Ltd
Introduction to Visual Basic 2008 Programming
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
CIS16 Application Development – Programming with Visual Basic
Chapter 1 Introduction(1.1)
Chapter 22, Part
Chapter 1 The Systems Development Environment
Presentation transcript:

 ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated regularly  Amount and complexity of data grew dramatically over the past decade so ETL processed became more complex and demanding

1. Requirements 2. Analysis 3. Design 4. Proof of Concept 5. Development 6. Testing 7. Execution 8. Verification

 Scope of the data migration - what data is required in target system?  Execution requirements – has to be within certain timeframe, sequence, geographic location, repeatability, acceptable system down time, etc.  Source data retention period, backup and restore requirements  Requirements should be made with this in mind:  Data is company’s most valuable asset.  Consequences of corrupted data are usually very costly.

 Understanding the source data  Data Dictionary usable for designing ETL process has to created  Mission critical task  Frequently underestimated (importance and time)  All available resource should be used to do analysis properly:  Available system documentation including Data Model and Data Dictionary  People  Reverse engineering

 Choice of methodology  Choice of technology  Design Target Database  Design ETL process  Data Mapping Document  Maps source data to target database  Specifies transformation rules  Specifies generated data (not from source)  Design ETL verification process  Ensure that all requirements are addressed

 Helps to determine or estimate:  Feasibility of the concept  Development time  Performances, capacity and execution time  Requirements could be met  Gain knowledge about the technology  Code produced in this phase usually can be re- used during the development phase

 Includes:  Produce code and processes as per Design and Data Mapping Document  Data verification scripts or programs as per Test Plan  Execution scripts as per Execution Plan  Unit testing – performed and documented by developers.  Typical Challenges:  Inadequate requirements and design documents  Developers unfamiliar with technology

 Ensures that requirements are met  Test Plan is highly recommended  Types of testing:  Functional, stress, load, integration, connectivity, regression  Challenges:  Automation and repeatability (testing and verification scripts)  Creation of the Test Data  Extracting small data sets from large data volumes  Confidential data may not be made available for testing

 Execution plan should include:  Sequence of tasks  Time of execution and expected duration of execution  Checkpoints and success criteria  Back out plan and continuation of business  Resources involved  For mission critical system down time could be limited or even entirely unacceptable  Execution should be controlled and verified

 Confirms that data migration was successful  Determined during the design phase  Various methodologies and technologies could be used  Automated verifications is highly recommended (driven by requirements)

 Underestimated complexity of the project  Overlooked or neglected phases of the project  Wrong choice of technology  Common misconceptions like  Expensive ETL tools will solve all problems  No or very little programming will be required  We don't need or we don't have time for plans, but we know exactly what we need to do

 Maintaining license and consultants is very expensive  Significant time required to learn  Usually require dedicated hardware  Cannot take advantage of database vendor proprietary technologies optimized for fastest data migration  For complex tasks very often requires integration with other technologies  Very limited performances  Only small amount of provided functionality is actually required for ETL project  Very limited application for Data Analysis  Huge discrepancy between marketing promises and actual performances

 During 15 years in IT Consulting business a proprietary ETL methodology and technology is developed  Consists of tow major modules:  Database Analyzer  G-DAO Framework  Major advantages:  Inexpensive, easier to learn and performs better than mainstream ETL Software  Any Java developer can master it and start using it within several days  It is proven and it works

 Produces ETL Data Analysis Reports in various formats  Major usage:  Analyze and understand source data and the database attributes  Create data mapping and transformation documents  Create data dictionary  Suggests ways to improve database design  Valuable source of information for Business Analysis, Data Architects, Developers, Database Administrators.

 Intuitive, descriptive and easy to read  In HTML format  Can be imported and edited in major document editors such as MS Word

 User friendly GUI Interface  It can also run in Batch mode for lengthy analysis (large data sources)

 Java Code Generator  Eliminates huge legwork to develop code required for ETL Process  Uses analysis performed by Database Analyzer to produce code optimized for particular database  Code may be used for purposes other than ETL (any kind of database access and data manipulation)  Uses advantage of almost unlimited world of java libraries (no proprietary languages and interfaces)

 Data can flow directly from source to target (no need to intermittent storage into files)  XA transactions are supported for all major databases  Functionality limited only by limitation of the JDBC driver and Java language  Easy to learn and implement  No dedicated hardware required  Provides platform for any kind of business application that requires data access

 Incorporated in 1993 in Toronto, Canada  Provided IT Consulting to:  Oracle Corporation  General Electric  Citibank  Royal Bank  Bank of Montreal  The Prudential  Standard Life  and many more  Most recent implementations of Database Analyzer and G- DAO Framework  Citibank  Royal Bank of Canada 