Enterprise Data Warehousing (EDW) By: Jordan Olp
Overview Definition Brief History Data Setup/Structure Benefits/Downfalls Future Conclusion Questions
Definition EDW is a central data repository containing data from multiple usually separated areas that store current and historical data that is used for data analysis and reporting.
Brief History Traces back to the 1960’s Became prominent in computing industries in ’s William Inmon Considered to be the “Father of Data Warehousing” Wrote numerous front running books on the topic including “Building the Data Warehouse” Is a leading voice for the Top-Down methodology for Data Warehousing design
Brief History cont. William Inmon cont. Founded Prism Solutions, provided one of the first industry Data Warehousing tools in the early 1990s Ralph Kimball Also wrote numerous books on the topic including “The Data Warehouse Toolkit” Provided solid practical modeling information from industry honed examples on modeling and setup Is a leading voice for the Bottom-Up methodology for Data Warehousing design
Data Setup/Structure Where the data originates How the data is processed Design methodologies on how data is stored
Pre-Warehouse Data Allocation Data is never sent directly from origin into Data Warehouse Always stored in separate specific locations usually based on department
Data Allocation cont. Sales Inventory Accounting Human Resources Customer Relationship Management Marketing Information Technology Customer Services Research & Development …
Extract, Transform, Load (ETL) Cleansing, Reformatting, Modeling This is the process of copying the data from its origin, making it more useful, and then loading it into the Data Warehouse One or multiple tools can be used to complete this process, sometimes even another tool than provided by the Data Warehouse itself
Extract Data is gathered from the original sources into a staging area where the data is Transformed before entering the Data Warehouse Metadata – “data about data” Is created after the Extract process and is also moved into the Data Warehouse with the rest of the data Cleansed/error checked, sometimes dada is corrupted from older systems and is removed from the valid data – Metadata helps the process
Transform Here the data is reformatted to make it more usable by the database Rules/Function/Metrics are applied to meet technical needs of the Data Warehouse Encoded, derived, joined, sorted…
Load Simply moves the Extracted, Transformed data from the staging area into the Data Warehouse Load times vary on the business and their needs
Design Methodologies Facts Single piece of data used as a value or measurement Normalized approach Facts are stored in tables, that are then grouped by data subject Dimensional approach Fact tables store data together, and dimensional tables reference fact tables
Design Methodologies Top-Down Design Relies heavily on data normalization Reduces data redundancy Allows for precise analytics for this design Data is closely knit together at atomic levels Data corresponds very closely to the real world events they related to
Design Methodologies Bottom-Up Design First need a Top-Down understanding of the business and processes you are trying to achieve Relies on separated access layers within the Data Warehouse – commonly called Data Marts Because of dimensional capabilities can achieve faster results – however it allows for data redundancy
Benefits Data bundling Application specific databases Business Intelligence - Reporting analytics Easily accommodates legacy and historical data Usually improves data quality Usefulness increases as data is accumulated
Downfalls Hefty price tags – hardware and software Usually require its own team to manage ETL process is staggering Adding new data sources or changing Warehouse scheme is difficult
Future Usefulness More data means better analytics Better predictability Well established Data Warehouses give companies a competitive edge Benefits of reporting far outweigh costs Data Warehousing…Evolved -> Data Activation Textual Information
Conclusion High cost – High reward Allows predictions of trends Can bring together nearly unrelated data and make amazing use of it More accurate and relevant data = better analytics = better reporting information = better business responses
Questions/Comments