MIS5101: Extract, Transform, Load (ETL)
Discuss (5 minutes) Based on the readings… Why are we drowning in data? Why the process of ETL necessary? What is the “single version of the truth?”
Why are we “drowning in data?” According to the article? Technological changes? Why are we “drowning in data?”
Evaluating the tradeoff vs value(Daccess) value(Daccuracy) How much does it cost? How much do you save? How much do your outcomes improve? How much is an incremental improvement worth? …and the relationships are probably non-linear
Extract, Transform, Load - ETL Copying data from the transactional database to a format where it can be analyzed Selecting and resolving inconsistencies in data to fill the analytical data store
ETL Defined in a “relational” world from various databases across the organization Extract it into a consistent, analysis-ready format Transform it into an “analytical” data store, where large-scale analysis is performed Load
ETL Defined in a “relational” world Extract Transform Load Real-time Database 1 Query Data conversion Query Data Warehouse (Analytical Data Store) On-Demand Reporting Real-time Database 2 Data conversion Query Query
Main ETL Issues: Conversion Stage What if the data is in different formats? Data Consistency How do we know it’s correct? What if there is missing data? What if the data we need isn’t there? Data Quality
Give examples of data inconsistences in retail in healthcare in finance How do you resolve them?
Conflicts abound… Why might there be resistance to this type of aggregation? Is it an option to just “fix” the transactional (source) databases? If two data elements conflict, who’s standard “wins?”