Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services Tel: (011)
Scenario Organisation faced with disparate data sets that are: Decommissioned systems Multiple systems In a format non-conducive to analysis But… Information in data sets needed for analysis Data structure not allowing analysis Information is available but inaccessible
Data Transformation Goals Overcome challenge of variable underlying data set structures through Creating a uniform, integrated data set that allows for timely and easily accessible reports Integrating data needs according to a central schema
Typical Ways Data Sets Vary Non-standardised table and field names where information or content is similar Differences in way data is stored within data sets (fields entered as text information, while others are numeric or code designated) Similar naming conventions in data sets for different information
Step One Solution Evaluate different data sets Focus on data base structure to respond to organisation’s reporting requirements Collaborate on an ideal data structure designed for ease of analysis Involve stakeholders and ensure buy-in Design new business processes
Step One Solution (cont) Acquire the data Maintain the integrity of the data sets Ensure transfer process maintains reliability and validity of data
Step Two Data Extraction Uncomplicated extraction: Importing an excel spreadsheet from an MS Excel file Converting word documents to Excel and then exporting spreadsheet Importing a C.S.V file Complicated extraction: Setting up relationships to external data systems such as Oracle, MS SQL and PostgreSQL
Step Three Transformation Utilise a third party system Follow schematic outline agreed with stakeholders Investigate the process of converting data formats though use of a data dictionary Use dimension and mapping system
Step Three Transformation (cont) Map information in data sets Take account of inherent dimensions Specify how the data will fit into the refined output data set/s Design internal checks to: Minimise mapping errors Reject incorrect mappings Transformation system does not store data, “translates” data from source to destination
Step Four - Loading Process where data is ‘deposited’ into a data warehouse (postgrSQL allows for efficient storage) Load process done through series of SQL scripts Loading process has series of checks that to ensure all data from source can be accounted for at destination
Conclusion & Questions Data Transformation is Vital to data analysis across programmes Essential to optimise use of current (multiple) data sets But… Requires a high level of data base expertise and scripting ability Questions