Download presentation
Presentation is loading. Please wait.
1
Data Warehousing Data Model
Resume Tracker Data Warehousing Data Model Interactive Warehouse Copyright © 2003 HP corporate presentation. All rights reserved.
2
Copyright © 2003 HP corporate presentation. All rights reserved.
Course Objectives Understand the Data Warehouse and its purpose. How Data warehouse is different from Transactional Systems? Multi Dimensional Model. Dimensions and Facts. What is OLAP? What is Data mart? Difference between ODS, Data warehouse and Data mart?. Architecture of DW Contd 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
3
Copyright © 2003 HP corporate presentation. All rights reserved.
Course Objectives Requirements gathering Tools Used in DW 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
4
Copyright © 2003 HP corporate presentation. All rights reserved.
Data Model Objectives ERD. Normalization, DeNoramalization. Review of Building an ER Model. ER to Logical Data Model. Difference Between Logical and Physical Data Model. Identification of Subject Areas. Dimensions. Facts. Attributes. Derived Facts 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
5
Copyright © 2003 HP corporate presentation. All rights reserved.
Workshop 1 Importance of Time Dimension Surrogate Keys. Aggregate Tables. Conformed Dimension. Slowly Changing Dimensions(SCD) Type 1 Type 2 Type 3 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
6
Copyright © 2003 HP corporate presentation. All rights reserved.
Workshop 2 Indexes. Partitioning. Performance Enhancement. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
7
Copyright © 2003 HP corporate presentation. All rights reserved.
Workshop 3 Logical Data Model. Physical Data Model. Convert LDM to PDM. Tools used Erwin. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
8
Copyright © 2003 HP corporate presentation. All rights reserved.
Course Agenda Overview Introduction to Data Warehousing . Data Warehouse Data Modeling Methodology . SA Diagram and Logical Data Model Highlights Getting Started Modeling Warehouse Components Additional Topics in Data Warehousing 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
9
Definition of Data warehouse
An information infrastructure that enables businesses to access and analyze detailed data and trends. A database used for storing historical data, which is used for data analysis A collection of data and information from various source systems. Logical collection of information, gathered from many different operational databases, that supports business analysis activities and decision-making tasks. . 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
10
Defintions of Datawarehouse (Contd)
A Data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data A Data Warehouse is a: Subject-Oriented Integrated Time variant Non-Volatile collection of data in support of management’s decision making process From Bill Inmon’s “Building the Data Warehouse,” John Wiley and Sons, Publisher, 1996 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
11
Defintions of Datawarehouse (Contd)
Subject-Oriented Modeling data for decision makers (not for day-to-day operations) Simple and concise view around particular subject issues (sales, customer, supplier, …) Excluding data that are not useful in the decision support process Focus is on Subject Areas rather than Applications 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
12
Defintions of Datawarehouse (Contd)
Integrated Result of integrating multiple heterogeneous sources Ensuring consistency: naming conventions, encoding structures, attribute measurements, Meta Data: this area of the DW stores all the meta-data (data about data) definitions used by all the processes in the warehouse. Application A – m,f Application B - 1,0 Application C - male, female m,f 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
13
Defintions of Datawarehouse (Contd)
Non-Volatile Existing data in the warehouse is not overwritten or updated except in few cases Two operations required: loading and access of data Not required: transaction processing, recovery, concurrency control Insert Change Access Delete Insert Load Change Access Record-by-Record Data Manipulation Mass Load / Access of Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
14
Defintions of Datawarehouse (Contd)
Time-Variant Data tagged with some element of time (e.g. date of purchase) Data stored to provide information from historical perspective (e.g. past 5-10 years): for trend analysis and forecasting Data in a data warehouse is only accurate during a certain time interval. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
15
Evolution of Data Warehousing
: MIS Era Focus on Reporting Unfriendly Slow Dependent on IS programmers Inflexible Analysis limited to defined reports 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
16
Evolution of Data Warehousing
: Querying Era Focus on Online Querying SQL as interface not scalable Cannot handle complex analysis Adhoc, unstructured access to corporate data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
17
Evolution of Data Warehousing
xx : Analysis Era Focus on Online Analysis Trend Analysis What If ? Moving Averages Cross Dimensional Comparisons Statistical profiles Automated pattern and rule discovery 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
18
Why Do We Need A Data Warehouse?
Each organization generates vast amount of data This data resides in various forms on different platforms at different places with different structures Resulting in... difficulty in managing extracting and doing meaningful analysis for decision support 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
19
Goals for a Data Warehouse
Should provide the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage. Designed to perform well with aggregate queries running on large amounts of data. Provide easier method for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions. Enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures. An efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
20
Advantages of a Data Warehouse
Better business intelligence for end-users Reduction in time to locate, access, and analyze information Consolidation of disparate information sources Strategic advantage over competitors Faster time-to-market for products and services Replacement of older, less-responsive decision support systems Reduction in demand on IS to generate reports 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
21
Problems of Data Warehousing
Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenization High demand for resources Data Ownership High maintenance Long-duration projects Complexity of integration 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
22
Copyright © 2003 HP corporate presentation. All rights reserved.
What Do We Need To Do? Use Operational Legacy Systems’ Data: To Build Operational Data Store (ODS), That Integrate Into Data Warehouse and Data Marts Legacy Systems ODS Data Warehouse 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
23
Operational versus Data Warehouses
Transactional Data Warehouse Audience……… Administrators Mangers,analysts Data content…… Current value Archived,calculated, summarized Data organization Application by application Subject areas across enterprise Data structure Complex,suitable for Simple,suitable for format… operational computation business analysis Data update…… Update on field-by-field basis Accessed and manipulated no direct update Measurement of Minimum cost to maintain, Business profit generation system success…… responsive to business needs or cost avoidance Nature of data… Dynamic,constantly changing Static,frozen as of a moment in time 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
24
Multidimensional Model
A Multidimensional model is a model of business activities in terms of Facts and Dimensions. Dimensions Dimension definition describes the dimension structure of the modeled sample Data. Dimension is a collection of data that describes one business dimension Eg:ProductDimension,TimeDimension The Dimension data can either be automatically generated or manually copied from the available data sources. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
25
Multidimensional Model
Facts Facts are Business Key Metrics which is are used analyze the context data based on the the business functions. Facts is a collection of related data items consisting of measures and context data. Measures It is Attribute of a Fact ,representing the performance or behavior of the business relative to the dimensions For E.g.: HR is the Business function. Count of employees per Practice/Dept. Count of Job Codes per Practice/Dept. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
26
Challenges To Obtaining Data
Operational data is designed for applications handling one record at a time . This format doesn’t support quick queries Extra, unexpected querying of operational data in the high-volume, real-time transaction processing environment often has a big impact on performance For the ad hoc queries, the data cannot be temporarily altered. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
27
Copyright © 2003 HP corporate presentation. All rights reserved.
OLAP: On-Line Analytical Processing OLAP can be defined as a technology which allows the users to view the aggregate data across measurements (like Maturity Amount, Interest Rate etc.) along with a set of related parameters called dimensions (like Product, Organization, Customer, etc.) 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
28
Definition of a Data Mart
A data mart is a subset of data from the data warehouse designed to support the specific requirements of a given business unit. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
29
Definition of a Data Mart Contd
Data Marts are logical subsets of the DW.They should be Consistent in their representation in order to assure DW robustness. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
30
Challenges To Obtaining Data continued
The very act of analyzing data will produce new data that the company may want to save: Subsets of operational data are created. Combinations or aggregations are created. Historical data is needed ( which may also be subsetted, aggregated, or combined in new ways ). The question arises of where do you store this new information? 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
31
Operational Data Store(ODS)
Operation Data Store (ODS) is a Business Intelligence environment/solution component that supports time-sensitive operational decision support (e.g., customer services). The ODS is narrowly focused on a particular set of business processes 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
32
Copyright © 2003 HP corporate presentation. All rights reserved.
ODS Contd The characteristics are: User updatable to the ODS and operational sources. Focus on current data (near real time access). High data volatility. Focus on integrated, detailed, and granular data. Complements or extends operational systems. Examples: Call Center Internet Transportation Capacity Management Network Optimization Risk Approval Load Authorization Fraud Detection 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
33
Copyright © 2003 HP corporate presentation. All rights reserved.
DW Components Metadata Layer Extraction Cleansing Data Mart Population S T A G I N R E Aggregation Summarization Legacy System FS1 FS2 FSn . Transformation DM1 DM2 DMn N E T W O R K DW ODS Transmission OLAP ANALYSIS Knowledge Discovery 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
34
Copyright © 2003 HP corporate presentation. All rights reserved.
Operational Process Data extraction Data Cleansing and Transformation Data Load and refresh Build derived data and views Service queries Administer the warehouse 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
35
Copyright © 2003 HP corporate presentation. All rights reserved.
Extraction Process ( Data Capturing ) Business Transactions Feed System Application Data Capturing Process Incremental Data Control Metadata Extract the incremental data from feed system Store the extracted data into a temporary area 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
36
Copyright © 2003 HP corporate presentation. All rights reserved.
Extraction Process (Data Transmission ) Feed System Side Incremental Data Staging area Incremental Data Network Cloud FTP Transmit the extracted data from Feed system to Staging area Periodicity of transmission ( daily / weekly ) depends upon the feed system 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
37
Copyright © 2003 HP corporate presentation. All rights reserved.
Cleansing Process Raw data (Staging Area) Cleansing Process Process Metadata Cleansing Rules Good Bad Clean data Control Metadata Cleansing Reports Clean the Raw Data Mark it Good/Bad Generate the cleansing Reports and mail to the DWA and Feed System representatives 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
38
Transformation Process
Process Metadata Mapping Detail Transformation Rule Transformation Process Clean Operational Data Operational Data Store Control Metadata Transform the cleaned Operational Data into DSS Data Load the DSS data into ODS ODS contains the current DSS data at the lowest level of granularity 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
39
Summarization Process
Weekly Monthly Yearly DW ODS Control Metadata Summarize and aggregate ODS data and Populate to the Warehouse Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily ) 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
40
Copyright © 2003 HP corporate presentation. All rights reserved.
Enterprise Data Warehouse Metadata Repository Legacy Select Extract Client/ Server U S E R S A P I Transform DATA WAREHOUSE OLTP Integrate Maintain External Data Preparation Operational Systems Enterprise wide Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
41
Copyright © 2003 HP corporate presentation. All rights reserved.
Data Marts Metadata Repository Legacy Select Extract Client/ Server U S E R S A P I Transform DATA MART OLTP Integrate Maintain External Data Preparation Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
42
Copyright © 2003 HP corporate presentation. All rights reserved.
Distributed Data Marts Legacy Select Data Mart Extract Client/ Server U S E R S A P I Transform Data Mart OLTP Integrate Maintain Data Mart External Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
43
Copyright © 2003 HP corporate presentation. All rights reserved.
Multi-tiered Data Warehouse Legacy Select Data Mart Extract Metadata Repository Client/ Server U S E R S A P I Transform DATA WAREHOUSE Data Mart OLTP Integrate Maintain Data Mart External Data Preparation Operational Systems Data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
44
Data Warehouse Architecture
85-90% of analysis Highly summarized Lightly summarized METADATA Current atomic data 10% of analysis Older atomic data 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
45
Goals for the Data Warehouse Project
The goals for the first data warehouse implementation should be: Specific Achievable Measurable Vague or negative goals ( e.g.., reduce redundancy or eliminate renewal of maintenance contracts) do not provide the focus required. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
46
Goals for the Data Warehouse Project continued
“Organizations employing a data warehouse architecture will reduce user-driven access to operational data stores by 75 percent, enhance overall data availability, increase effectiveness and timeliness of business decisions and decrease resources required by IS to build and maintain reports (0.8 probability).” Source: Gartner Group, December 21, 1994 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
47
Goals for the Data Warehouse Project continued
To provide the focus needed for a successful data warehouse, consider the following question: Why are you building the data warehouse environment? What is your vision of the final data warehouse? How will the warehouse fulfill customer/end user needs? Who is the customer? Note: Goals will change over time. New opportunities to solve problems will arise. Blind adherence to goals set when you start may prevent creative solutions from developing. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
48
Goals for the Data Warehouse Project continued
IT Goals Establish a client/server environment Incorporate between 8 and 12 new tables Define 100% of our metadata and have it accessible End User Goals Identify and acquire an easy-to-use off the shelf end-user query tool Reproduce specific reports ( e.g.., Top 10 Customers for Last Quarterly, Quarterly Product Trends Reports) Have, at a minimum, hard copy of metadata 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
49
Requirements Gathering:
Understanding of the Systems that needs requires Data warehouse to be Implemented. Dimensional Nature of Business Data Managers think of the business in terms of business dimensions Marketing Vice President Marketing Manager How much did my new product generate month by month, in the southern division, by user demographic, by sales office, relative to the previous version, and compared to plan? Give me sales statistics by products, summarized by product categories, daily, weekly, and monthly, by sale districts, by distribution channels. 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
50
Copyright © 2003 HP corporate presentation. All rights reserved.
Tools Used in DW 12/3/2018 Copyright © 2003 HP corporate presentation. All rights reserved.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.