Download presentation
Presentation is loading. Please wait.
Published byUlysses Carlton Modified over 9 years ago
1
Demonstration 10 EDW Implementation Strategy and Process 1/10/2012 www.InstantBI.com
2
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 2 Agenda Standard Implementation Strategy Standard Implementation Process Prototyping Setting up the Staging Area
3
Standard Implementation Strategy
4
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 4 Prototyping “Plan to build it twice. You will anyway” Frederick P Brooks – The Mythical Man Month The BEST IT development book ever written DWs have become FAR more sophisticated Early 90s (90-94) 10-20 dimension tables and 5-10 fact tables All about ‘sales’ and campaigns These were the high value applications 300+ work days to build the ETL for these in cobol Today with models like BI4ALL or even custom development 100+ dimension tables and 30+ fact tables (easily) 200-300 work days to build the ETL Much more of the ‘ETL time’ is ‘understanding data’
5
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 5 Prototyping Prototyping allows you to Find data understanding errors before writing ETL Quickly tune database model for performance Start developing reports earlier Report development tools have become more ‘complex’ Ensure data integrity earlier Data integrity is the #1 killer of ETL productivity We used to only find errors when we wrote the ETL Find ‘assumptions’ that do not hold up for this EDW More easily communicate the ‘end result’ to business users If you can’t build the prototype, you can’t build the real thing Bottom Line: Strongly recommended
6
1 System 1 System 2 System 3 EDW Staging Area Trans form And Load EDW Validate and Clean Source Systems Direct Connect App 1 Commercial Specific Apps Data Marts Data Mart 1 BI Apps 1 BI Apps 2 BI Apps 3 Extracts Reporting Systems System 4 System 5 System 6 System 7 System 8 System 9 System 10 ODS Trans form And Load Extracts Data Mart 2 Data Mart 3 Data Mart 4 Data Mart 5 Data Mart 6 App 2App 3App 4 App 5 App 6 App 7 App 8 App 9App 10 App 11 App 12 App 13 App 14
7
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 7 Standard Implementation Strategy Data is ‘somehow’ extracted from source systems Must be careful to detect deletes Can be a very difficult problem to solve Data, as raw as possible, sent to Staging Area Usually as files, sometimes as ODBC links The OLTP system usually controls extraction schedule Extract one field -> extract whole table Data is profiled to determine real data type Staging area tables use ‘real’ data types Start developing understanding of raw data Understanding data is a huge and difficult job usually
8
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 8 Standard Implementation Strategy Data loaded into ‘staging area’ Today we can afford RDBMS staging areas (used to be files) As data goes into the staging area fields are converted from possible native types to RDBMS types Numeric/string dates, codes, flags, numerics in chars Errors must be caught so both source and target columns are kept so the source value is visible Calculations within a row are performed Durations/elapsed times, ages etc Three Flags are set Row deleted from source – if it was deleted Row valid – valid by default – set to ‘N’ if found invalid Row sent to EDW = ‘N’ because it has not It is then possible to run ‘cross table validations’ on the data BEFORE sending it into the EDW Always beware sending invalid data into a DW
9
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 9 Example Staging Table CREATE TABLE UNBILLED_CALLS ( CUM_START_DATE_TIME DATE NOT NULL, CUM_START_DATE_TIME DATE NOT NULL, CALL_DIALLED_DIGITS VARCHAR2(18 BYTE), CALL_DIALLED_DIGITS VARCHAR2(18 BYTE), CALL_DURATION NUMBER(10,2) NOT NULL, CALL_DURATION NUMBER(10,2) NOT NULL, CALL_RETAIL_PRICE NUMBER(14,3) NOT NULL, CALL_RETAIL_PRICE NUMBER(14,3) NOT NULL, CALL_BREAKDOWN_CODE VARCHAR2(5 BYTE) NOT NULL, CALL_BREAKDOWN_CODE VARCHAR2(5 BYTE) NOT NULL, CALL_DISCOUNT NUMBER(4,1) NOT NULL, CALL_DISCOUNT NUMBER(4,1) NOT NULL, CALL_UNITS NUMBER(4) NOT NULL, CALL_UNITS NUMBER(4) NOT NULL, CALL_PP_ALLOWANCE NUMBER(5,1) NOT NULL, CALL_PP_ALLOWANCE NUMBER(5,1) NOT NULL, CALL_CLASS VARCHAR2(5 BYTE), CALL_CLASS VARCHAR2(5 BYTE), CALL_CATEGORY VARCHAR2(5 BYTE), CALL_CATEGORY VARCHAR2(5 BYTE), CALL_ORIGINATION VARCHAR2(2 BYTE), CALL_ORIGINATION VARCHAR2(2 BYTE), SERVICE_CODE VARCHAR2(2 BYTE) NOT NULL, SERVICE_CODE VARCHAR2(2 BYTE) NOT NULL, CUM_CUSTOMER NUMBER(10) NOT NULL, CUM_CUSTOMER NUMBER(10) NOT NULL, CUM_SUBSCRIBER VARCHAR2(18 BYTE) NOT NULL, CUM_SUBSCRIBER VARCHAR2(18 BYTE) NOT NULL, CALL_DIRECTION VARCHAR2(2 BYTE), CALL_DIRECTION VARCHAR2(2 BYTE), CALL_LOCATION VARCHAR2(13 BYTE), CALL_LOCATION VARCHAR2(13 BYTE), CALL_DESTINATION VARCHAR2(5 BYTE), CALL_DESTINATION VARCHAR2(5 BYTE), CALL_RECORD_TYPE VARCHAR2(3 BYTE) NOT NULL, CALL_RECORD_TYPE VARCHAR2(3 BYTE) NOT NULL, SERVICE_TYPE VARCHAR2(2 BYTE) NOT NULL, SERVICE_TYPE VARCHAR2(2 BYTE) NOT NULL, BUCKET_TYPE VARCHAR2(5 BYTE), BUCKET_TYPE VARCHAR2(5 BYTE), ROW_DEL_FRM_SRC_IND VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL, ROW_DEL_FRM_SRC_IND VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL, ROW_VALID_IND VARCHAR2(1 BYTE) DEFAULT 'Y' NOT NULL, ROW_VALID_IND VARCHAR2(1 BYTE) DEFAULT 'Y' NOT NULL, ROW_SENT_TO_IWS VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL ROW_SENT_TO_IWS VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL)
10
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 10 Getting Data into Staging Area Get data into a Staging Area asap Helps in learning to understand the data Can query/browse much more easily than in files We now use (free) utilities to load staging area pttype Using utilities is faster and less costly than using Infa for pptype development Read a wide variety of files and load the data ‘as is’ Defaults the flags Get ALL data for this release into staging area before starting mapping Or at least as much as possible Late arriving data can only confuse the issue
11
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 11 Cross Table Validations Cross table validations might include Checking customer/account exists for a sales record Checking address is a valid address Checking details provided by retailer match other systems such as sell through capture Checking codes entered on tables exist The list of ‘possible’ things to check is endless EA must decided what validations will stop data from flowing into the EDW These are more likely to be business than technology based We have only built to capability to do so, not the rules themselves Can be implemented in Infa or Stored Procedures
12
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 12 Data Quality Data Quality Measures/Correction Can be implemented in ETL tools if acquired or ‘home grown’ Can send data back to source systems if needed Is a whole ‘other topic’ But is will be done at some point in the life of EDW
13
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 13 The ‘Mapping Spreadsheet’ So, Staging Area has data in it ‘What to do next?’ Build the left hand side of the ‘mapping spreadsheet’ Source to Target Mapping The right hand side starts blank!! Once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and Data Modelling. To do this, you need to understand current models and have some ideas about BI4ALL model… Now begins the modeling portion of the training…
14
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 14 Data Mapping Filling in the right hand side of mapping spreadsheet Defining ‘real keys’ from source data Defining tables to be joined/split Defining how to present a ‘view’ of the staging area such that it can be sent into the EDW Defining changes to EDW model Two columns in SS - Table Exists/Column Exists EDW Modeller sets them for the DBA We will see the ‘in progress’ version later Each source field required in the EDW is mapped Notes and comments are included… Database level transformations are included in SS Key role in any EDW implementation
15
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 15 Prototype Building DBA implements changes to ODW/EDW Physical Model Tables, indexes, naming standards Key role is to ‘catch mistakes’ by the modeller With 000’s of fields to map modeller will make mistakes From the SS we can now generate pptype ETL Tools available from IBI One of which was just written (SeETL) Includes all elements such as Generated views Updating control tables Running and testing on windows 2008+ All prior to building Informatica/DataStage ETL
16
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 16 Prototype Building Once Prototype is ‘relatively stable’ Sizable amounts of test data can be loaded Prototype can be moved to deployment platform Presentation Views can be created Cognos Catalogs/Business Objects started Early reports started More learning about real data volumes can happen When errors are found (and they will be found) Prototype can be changed easily Generated ETL can be changed easily When everyone is happy with the pptype EDW is ‘made real’ Real database, ETL, reports
17
Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 17 Conclusion We have talked about the overall process of EDW Implementation So, once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and BI4ALL Modelling. To do this, you need to understand the EDW model… Now begins the modeling portion of the training…
18
Thank You for Your Time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.