Overview of Data Warehousing (DW) and OLAP Dinko Bačić
Agenda Timeline Definitions and Model DW Characteristics Data Modeling Multidimensional models Multidimensional schemas Building DW Issues/Current trends OLAP demo
Business Data Warehouse DW-Centric Timeline Mainframe Era Relational PCs OLAP ’50s ‘70 mid ‘70s ‘93 Business Data Warehouse Top-down Design CONCEPT Bottoms-up Design SOURCE 1988 1991 1996 NAME Barry Devlin and Paul Murphy Bill Inmon Ralph Kimball
DW Definitions A subject-oriented, integrated, non-volatile, time-variant, collection of data in support of management’s decisions (Inmon, 1993) A copy of transaction data specifically structured for query and analysis (Kimball, 1996) A collection of decision support technologies, aimed at enabling the knowledge worker to make better and faster decisions (Chaudhari and Dayal, 1997)
DW Characteristics 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client/server architecture 6. Generic dimensionality
DW Characteristics 7. Dynamic sparse matrix handling 8. Multi-user support 9. Unrestricted cross-dimensional operations 10. Intuitive data manipulation 11. Flexible reporting 12. Unlimited dimensions and aggregation levels
Data Modeling
Data Modeling
Data Modeling
Data Modeling
Data Modeling – Star Schema
Data Modeling – Snowflake Schema
Data Modeling – Fact Constellation
Building DW Data must be: extracted from multiple sources formatted for consistency cleaned to ensure validity fitted into the data model of DW loaded into the DW
Issues Implementation issues Construction Administration Quality control Some Open Issues Automation Active database functionality Incorporation of domains and business rules
DW Trends Real- time DW Data Management Practices Cloud Computing and SaaS In-memory computing and 64-bit computing Open Source software Advanced Analytics Services Processing Architecture DW Appliances and similar platforms New database management systems
Applied Research in Visual Analytics What Technologies Will be the Most Important to you in the Next Three Years? 200 CIO Directors and 20 Business Sponsors replied: Visualization ranked second highest trend in BI for next three years I briefly visited the Executive Summit to hear about what trends CIOs and VPs of business intelligence think will have the biggest impact over the next three years. Predictive analytics once again topped the list, but now in the second-highest spot was advanced visualization and discovery* aRIVA Applied Research in Visual Analytics Posted by Cindi Howson Monday, August 23, 2010
Overview of current topics in BI Related topics of Visualization, Dashboards and Agile BI make compelling case for Information Visualization aRIVA Applied Research in Visual Analytics
References Elmasri, R. and Navathe, S. Fundamental of database systems, 5th Editions, Addison Wesley 2006 Chaudhari, S, and Dayal, U. “An Overview of Data Warehousing and OLAP Technology”, SIGMOD Record, Vol. 26, No 1, March 1997 Kimball, R. The Data Warehousing Toolkit, Wiley, Inc. 1996 Inmon, W.H. Building the Data Warehouse, Wiley, Inc. 1992 Russon, P. “Next Generation of Data Warehouse Platforms”, TDWI Best Practices Report, Fourth Quarter 2009
Important terms Database Data Warehouse LAPs (OLAP, ROLAP, MOLAP) DSS EIS Data Mining Metadata OLTP Enterprise-wide data warehouses Virtual Data warehouses Data Marts Star Schema Snowflake Schema Fact Constellation Backflushing Distributed DW Federated DW