Data Warehouse Overview September 28, 2012 presented by Terry Bilskie Set Up: Computer with PowerPoint or recent version of IE and network. Projector Objective: Clear, informative, Consistent message across the institution. This update is particular to SIS to Banner Student migration.
Presentation Objectives: Data Warehouse Overview Definition Benefits & Considerations Terminology Architecture Information Access Maturity Roadmap to a more Data Driven Institution In essence, provide an overview of what a data warehouse is, where it fits in the overall information access maturity process, status of where we (VU) is and state what our current strategy has been to date. Once this committee better qualifies what our immediate assessment data needs are and with respect to assessment and future needs are, we’ll need to assess whether our existing approach of “build it as requested” is the appropriate strategy or whether we need to “buy” solution/s to mature us sooner rather than later.
Data Warehouse, isn’t it clear to you ?
Data Warehouse Definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process. What is it ?, as you can see it is merely a collection of data,,,,,,nothing more.
Data Warehouse is not: • A single physical piece of hardware or a software product. • A single project with an end • A single solution or product
Data Warehouse is: • A necessary component in order to achieve higher end reporting and analysis capability with respect to historical data, current trends, and future projections. • A data source • A combination of software and hardware
Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Data Warehouse Legacy System Flat File Data Processing Data Transformation
Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data load access
Data Warehouse Benefits Speed up reporting Reduce reporting load on transactional systems Make institutional data more user-friendly and accessible Integrate data from different source systems Enable ‘point-in-time’ analysis and trending over time To help identify and resolve data integrity issues, either in the warehouse itself or in the source systems that collect the data
Data Warehouse Benefits Has a subject area orientation Integrates data from multiple, diverse sources Allows for analysis of data over time Adds ad hoc reporting and enquiry Provides analysis capabilities to decision makers Relieves the development burden on IT
Data Warehouse Benefits Relieves the development burden on IT Provides improved performance for complex analytical queries Relieves processing burden on transaction oriented databases Allows for a continuous planning process Converts corporate data into strategic information
Data Warehouse Considerations High-level support Identification of reporting needs by subject area and organizational role Bridging the gap between reporting needs and technical specifications Partnerships with central and campus administrative areas Customer support and training
Data Warehouse Terminology A copy of transaction data specifically structured for querying and reporting Data Mart A logical subset of the complete data warehouse OLAP (On-Line Analytic Processing) The activity of querying and presenting text and number data, usually with underlying multidimensional ‘cubes’ of data Dimensional Modeling A specific discipline for modeling data that is an alternative to entity-relationship (E/R) modeling; usually employed in data warehouses and OLAP systems.
Data Warehouse Architecture What makes up a Data Warehouse ? Concepts Characteristics Logical & Physical Components
A Data Warehouse Is A Component Raw Detail No/Minimal History Integrated Scrubbed History Summaries Targeted Specialized (OLAP) Data Characteristics Design Mapping Source OLTP Systems Architected Data Mart Central Repository Load Index Aggregation Data Warehouse Extract Scrub Transform End User Workstations Replication Data Set Distribution Access & Analysis Resource Scheduling & Distribution Meta Data System Monitoring
Tiered Architecture Data Storage Analysis Query/Reports Data mining Extract Transform Load Refresh Data Sources Operational Databases External Sources Serve OLAP Engine OLAP Server Tier2: OLAP Server Tier3: Clients Tier1: Data Warehouse Server Data Warehouse Analysis Query/Reports Data mining Data Marts Data Storage Front-End Tools
Data Warehouse Architecture Data Warehouse server almost always a relational DBMS,rarely flat files OLAP servers to support and operate on multi-dimensional data structures Clients Query and reporting tools Analysis tools Data mining tools
Data Warehouse from a logical perspective
Another look from a logical perspective
How it fits into Business Intelligence Viewpoint
Data to Knowledge Process
How a Data Warehouse fits within our overall Data Goverenance
Current Strategy / Approach
Data Access Delivery Mechanisms Ad-hoc Reporting Access Scheduled and On-Demand Report Generation Using tools such as e~print, discoverer, ms access and excel, jobsub, population selection, argos, etc.
Data Warehouse from a conceptual perspective A data warehouse is based on a multidimensional data model which views data in the form of a data cube
Conceptual Model Student Profile 1 2 3 4 sum First Time Returning Data View Student Profile 1 2 3 4 sum First Time Type of Student Returning Vincennes Transfer At Rsik Jasper Campus Indianapolis Out of State ALL
Roadmap to Data Driven Institution
Data Driven Framework Pillars of Success
Questions and Answers Data Warehouse Concepts Summary: Data 2 Information Process is a journey not a destination, thus is incremental. Next Step: Identification of our immediate needs and “best guess” what our future needs will be. Qualify Student Profile Data needed ? Qualify what will it take to get to a more data driven insitution ? Build vs. Buy, go shopping Demos