Data Warehousing Data Model

Slides:



Advertisements
Similar presentations
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Advertisements

Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Managing Data Resources
Data Warehouse IMS5024 – presented by Eder Tsang.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Understanding Data Warehousing
Database Systems – Data Warehousing
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehouse Concepts Transparencies
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CISB594 – Business Intelligence
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
UNIT-II Principles of dimensional modeling
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
CISB594 – Business Intelligence Data Warehousing Part I.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Intro to MIS – MGS351 Databases and Data Warehouses
Introduction To DBMS.
CHAPTER SIX DATA Business Intelligence
Data Warehousing Data warehousing provides architectures & tools for business executives to systematically organize, understand & use their data to make.
Advanced Applied IT for Business 2
Data warehouse.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse—Subject‐Oriented
Data storage is growing Future Prediction through historical data
Data Warehouse.
Databases and Data Warehouses Chapter 3
MANAGING DATA RESOURCES
Chapter 1 Database Systems
Database Vs. Data Warehouse
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
An Introduction to Data Warehousing
Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Supporting End-User Access
Data warehouse.
Data Warehousing Data Model –Part 1
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Chapter 1 Database Systems
Data Warehousing Concepts
Chapter 3 Database Management
Data Warehouse and OLAP Technology
Presentation transcript:

Data Warehousing Data Model Resume Tracker Data Warehousing Data Model Interactive Warehouse Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Course Objectives Understand the Data Warehouse and its purpose. How Data warehouse is different from Transactional Systems? Multi Dimensional Model. Dimensions and Facts. What is OLAP? What is Data mart? Difference between ODS, Data warehouse and Data mart?. Architecture of DW Contd 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Course Objectives Requirements gathering Tools Used in DW 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Data Model Objectives ERD. Normalization, DeNoramalization. Review of Building an ER Model. ER to Logical Data Model. Difference Between Logical and Physical Data Model. Identification of Subject Areas. Dimensions. Facts. Attributes. Derived Facts 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Workshop 1 Importance of Time Dimension Surrogate Keys. Aggregate Tables. Conformed Dimension. Slowly Changing Dimensions(SCD) Type 1 Type 2 Type 3 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Workshop 2 Indexes. Partitioning. Performance Enhancement. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Workshop 3 Logical Data Model. Physical Data Model. Convert LDM to PDM. Tools used Erwin. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Course Agenda Overview Introduction to Data Warehousing . Data Warehouse Data Modeling Methodology . SA Diagram and Logical Data Model Highlights Getting Started Modeling Warehouse Components Additional Topics in Data Warehousing 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Definition of Data warehouse An information infrastructure that enables businesses to access and analyze detailed data and trends. A database used for storing historical data, which is used for data analysis A collection of data and information from various source systems. Logical collection of information, gathered from many different operational databases, that supports business analysis activities and decision-making tasks. . 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Defintions of Datawarehouse (Contd) A Data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data A Data Warehouse is a: Subject-Oriented Integrated Time variant Non-Volatile collection of data in support of management’s decision making process From Bill Inmon’s “Building the Data Warehouse,” John Wiley and Sons, Publisher, 1996 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Defintions of Datawarehouse (Contd) Subject-Oriented Modeling data for decision makers (not for day-to-day operations) Simple and concise view around particular subject issues (sales, customer, supplier, …) Excluding data that are not useful in the decision support process Focus is on Subject Areas rather than Applications 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Defintions of Datawarehouse (Contd) Integrated Result of integrating multiple heterogeneous sources Ensuring consistency: naming conventions, encoding structures, attribute measurements, Meta Data: this area of the DW stores all the meta-data (data about data) definitions used by all the processes in the warehouse. Application A – m,f Application B - 1,0 Application C - male, female m,f 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Defintions of Datawarehouse (Contd) Non-Volatile Existing data in the warehouse is not overwritten or updated except in few cases Two operations required: loading and access of data Not required: transaction processing, recovery, concurrency control Insert Change Access Delete Insert Load Change Access Record-by-Record Data Manipulation Mass Load / Access of Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Defintions of Datawarehouse (Contd) Time-Variant Data tagged with some element of time (e.g. date of purchase) Data stored to provide information from historical perspective (e.g. past 5-10 years): for trend analysis and forecasting Data in a data warehouse is only accurate during a certain time interval. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Evolution of Data Warehousing 1960 - 1985 : MIS Era Focus on Reporting Unfriendly Slow Dependent on IS programmers Inflexible Analysis limited to defined reports 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Evolution of Data Warehousing 1985 - 1990 : Querying Era Focus on Online Querying SQL as interface not scalable Cannot handle complex analysis Adhoc, unstructured access to corporate data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Evolution of Data Warehousing 1990 - 20xx : Analysis Era Focus on Online Analysis Trend Analysis What If ? Moving Averages Cross Dimensional Comparisons Statistical profiles Automated pattern and rule discovery 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Why Do We Need A Data Warehouse? Each organization generates vast amount of data This data resides in various forms on different platforms at different places with different structures Resulting in... difficulty in managing extracting and doing meaningful analysis for decision support 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Goals for a Data Warehouse Should provide the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage. Designed to perform well with aggregate queries running on large amounts of data. Provide easier method for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions. Enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures. An efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Advantages of a Data Warehouse Better business intelligence for end-users Reduction in time to locate, access, and analyze information Consolidation of disparate information sources Strategic advantage over competitors Faster time-to-market for products and services Replacement of older, less-responsive decision support systems Reduction in demand on IS to generate reports 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Problems of Data Warehousing Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenization High demand for resources Data Ownership High maintenance Long-duration projects Complexity of integration 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. What Do We Need To Do? Use Operational Legacy Systems’ Data: To Build Operational Data Store (ODS), That Integrate Into Data Warehouse and Data Marts Legacy Systems ODS Data Warehouse 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Operational versus Data Warehouses Transactional Data Warehouse Audience……….... Administrators Mangers,analysts Data content…….. Current value Archived,calculated, summarized Data organization.. Application by application Subject areas across enterprise Data structure Complex,suitable for Simple,suitable for format… operational computation business analysis Data update…….. Update on field-by-field basis Accessed and manipulated no direct update Measurement of Minimum cost to maintain, Business profit generation system success…… responsive to business needs or cost avoidance Nature of data….... Dynamic,constantly changing Static,frozen as of a moment in time 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Multidimensional Model A Multidimensional model is a model of business activities in terms of Facts and Dimensions. Dimensions Dimension definition describes the dimension structure of the modeled sample Data. Dimension is a collection of data that describes one business dimension Eg:ProductDimension,TimeDimension The Dimension data can either be automatically generated or manually copied from the available data sources. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Multidimensional Model Facts Facts are Business Key Metrics which is are used analyze the context data based on the the business functions. Facts is a collection of related data items consisting of measures and context data. Measures It is Attribute of a Fact ,representing the performance or behavior of the business relative to the dimensions For E.g.: HR is the Business function. Count of employees per Practice/Dept. Count of Job Codes per Practice/Dept. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Challenges To Obtaining Data Operational data is designed for applications handling one record at a time . This format doesn’t support quick queries Extra, unexpected querying of operational data in the high-volume, real-time transaction processing environment often has a big impact on performance For the ad hoc queries, the data cannot be temporarily altered. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. OLAP: On-Line Analytical Processing OLAP can be defined as a technology which allows the users to view the aggregate data across measurements (like Maturity Amount, Interest Rate etc.) along with a set of related parameters called dimensions (like Product, Organization, Customer, etc.) 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Definition of a Data Mart A data mart is a subset of data from the data warehouse designed to support the specific requirements of a given business unit. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Definition of a Data Mart Contd Data Marts are logical subsets of the DW.They should be Consistent in their representation in order to assure DW robustness. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Challenges To Obtaining Data continued The very act of analyzing data will produce new data that the company may want to save: Subsets of operational data are created. Combinations or aggregations are created. Historical data is needed ( which may also be subsetted, aggregated, or combined in new ways ). The question arises of where do you store this new information? 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Operational Data Store(ODS) Operation Data Store (ODS) is a Business Intelligence environment/solution component that supports time-sensitive operational decision support (e.g., customer services). The ODS is narrowly focused on a particular set of business processes 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. ODS Contd The characteristics are: User updatable to the ODS and operational sources. Focus on current data (near real time access). High data volatility. Focus on integrated, detailed, and granular data. Complements or extends operational systems. Examples: Call Center Internet Transportation Capacity Management Network Optimization Risk Approval Load Authorization Fraud Detection 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. DW Components Metadata Layer Extraction Cleansing Data Mart Population S T A G I N R E Aggregation Summarization Legacy System FS1 FS2 FSn . Transformation DM1 DM2 DMn N E T W O R K DW ODS Transmission OLAP ANALYSIS Knowledge Discovery 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Operational Process Data extraction Data Cleansing and Transformation Data Load and refresh Build derived data and views Service queries Administer the warehouse 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Extraction Process ( Data Capturing ) Business Transactions Feed System Application Data Capturing Process Incremental Data Control Metadata Extract the incremental data from feed system Store the extracted data into a temporary area 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Extraction Process (Data Transmission ) Feed System Side Incremental Data Staging area Incremental Data Network Cloud FTP Transmit the extracted data from Feed system to Staging area Periodicity of transmission ( daily / weekly ) depends upon the feed system 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Cleansing Process Raw data (Staging Area) Cleansing Process Process Metadata Cleansing Rules Good Bad Clean data Control Metadata Cleansing Reports Clean the Raw Data Mark it Good/Bad Generate the cleansing Reports and mail to the DWA and Feed System representatives 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Transformation Process Process Metadata Mapping Detail Transformation Rule Transformation Process Clean Operational Data Operational Data Store Control Metadata Transform the cleaned Operational Data into DSS Data Load the DSS data into ODS ODS contains the current DSS data at the lowest level of granularity 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Summarization Process Weekly Monthly Yearly DW ODS Control Metadata Summarize and aggregate ODS data and Populate to the Warehouse Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily ) 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Enterprise Data Warehouse Metadata Repository Legacy Select Extract Client/ Server U S E R S A P I Transform DATA WAREHOUSE OLTP Integrate Maintain External Data Preparation Operational Systems Enterprise wide Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Data Marts Metadata Repository Legacy Select Extract Client/ Server U S E R S A P I Transform DATA MART OLTP Integrate Maintain External Data Preparation Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Distributed Data Marts Legacy Select Data Mart Extract Client/ Server U S E R S A P I Transform Data Mart OLTP Integrate Maintain Data Mart External Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Multi-tiered Data Warehouse Legacy Select Data Mart Extract Metadata Repository Client/ Server U S E R S A P I Transform DATA WAREHOUSE Data Mart OLTP Integrate Maintain Data Mart External Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Data Warehouse Architecture 85-90% of analysis Highly summarized Lightly summarized METADATA Current atomic data 10% of analysis Older atomic data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Goals for the Data Warehouse Project The goals for the first data warehouse implementation should be: Specific Achievable Measurable Vague or negative goals ( e.g.., reduce redundancy or eliminate renewal of maintenance contracts) do not provide the focus required. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Goals for the Data Warehouse Project continued “Organizations employing a data warehouse architecture will reduce user-driven access to operational data stores by 75 percent, enhance overall data availability, increase effectiveness and timeliness of business decisions and decrease resources required by IS to build and maintain reports (0.8 probability).” Source: Gartner Group, December 21, 1994 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Goals for the Data Warehouse Project continued To provide the focus needed for a successful data warehouse, consider the following question: Why are you building the data warehouse environment? What is your vision of the final data warehouse? How will the warehouse fulfill customer/end user needs? Who is the customer? Note: Goals will change over time. New opportunities to solve problems will arise. Blind adherence to goals set when you start may prevent creative solutions from developing. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Goals for the Data Warehouse Project continued IT Goals Establish a client/server environment Incorporate between 8 and 12 new tables Define 100% of our metadata and have it accessible End User Goals Identify and acquire an easy-to-use off the shelf end-user query tool Reproduce specific reports ( e.g.., Top 10 Customers for Last Quarterly, Quarterly Product Trends Reports) Have, at a minimum, hard copy of metadata 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Requirements Gathering: Understanding of the Systems that needs requires Data warehouse to be Implemented. Dimensional Nature of Business Data Managers think of the business in terms of business dimensions Marketing Vice President Marketing Manager How much did my new product generate month by month, in the southern division, by user demographic, by sales office, relative to the previous version, and compared to plan? Give me sales statistics by products, summarized by product categories, daily, weekly, and monthly, by sale districts, by distribution channels. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.

Copyright © 2003 HP corporate presentation. All rights reserved. Tools Used in DW 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.